Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (math.PR) 2026-06-17

Spectral recovery of a planted triangle-dense subgraph

arXiv:2606.17604v1 Announce Type: cross Abstract: Given a simple graph on $n$ vertices and a parameter $k$, the triangle-densest-$k$-subgraph problem is known to be computationally hard in the worst case. To circumvent the computational hardness, we study an average-case model where a triangle-dense subgraph on $k$ vertices is planted in an Erdős-Rényi random graph on $n$ vertices. For the recovery of the planted subgraph, we propose a simple spectral algorithm and a semidefinite program, both of which use a graph matrix whose entries are local signed triangle counts. Theoretical guarantees for these algorithms are established through spectral analysis of the graph matrix. Finally, we provide evidence showing a statistical-to-computational gap analogous to that for the planted clique problem. The computational threshold in terms of the subgraph size $k$ is at least $\sqrt{n}$ in the framework of low-degree polynomial algorithms, while the information-theoretic threshold is at most logarithmic in $n$.

02.
arXiv (CS.LG) 2026-06-18

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

arXiv:2604.14906v3 Announce Type: replace-cross Abstract: The pseudoknot secondary structure in SARS-CoV-2 RNA is essential for regulating protein synthesis through $-$1 programmed ribosomal frameshifting ($-1$ PRF), a mechanism that allows the virus to generate both structural and non-structural proteins from overlapping reading frames. This pseudoknot exhibits both threaded and unthreaded long-lived topologies. The influence of ligand binding on its folding is a process critical for the development of $-$1 PRF small-molecule inhibitors. Understanding this process through unbiased molecular dynamics (MD) simulations can be facilitated by introducing collective variables (CVs) that capture the corresponding slowest dynamical modes. Here, we use spectral map (SM), a thermodynamics-driven machine learning technique, to learn such CVs directly from all-atom MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and its two structural analogs in neutral and ionized forms. Free-energy landscapes (FELs) derived from the learned CVs indicate that ligand-induced destabilization is topology-selective. In the threaded pseudoknot, the inhibitors destabilize the S2 stem, while in the unthreaded pseudoknot, destabilization occurs in the S1 and S3 stems. Furthermore, the extent to which each ligand reshapes the FEL matches experimentally reported antiviral potency, whereas the protonation state qualitatively alters dynamics within the same RNA topology. Overall, our results show how pseudoknot topology, ligand type, and protonation state collectively influence the slow conformational dynamics of viral RNA and establish physiological protonation as a critical factor for modeling RNA-targeted drug action.

03.
arXiv (math.PR) 2026-06-12

Fourier Dimensions of Mandelbrot Cascades under Minimal Integrability

Authors:

arXiv:2606.08703v2 Announce Type: replace Abstract: This note announces exact Fourier dimension formulas for canonical Mandelbrot cascade measures under the minimal Kahane Peyriere integrability condition and records the canonical b adic extension on cubes. In the dyadic interval setting, the theorem is proved in a balanced vector weight model allowing dependence between sibling weights. Almost surely on non extinction, the Fourier, energy, and L2 dimensions all equal the energy exponent. The scalar specialization gives the canonical Mandelbrot Kahane Fourier dimension formula under the minimal integrability condition. On the circle, the endpoint formula is given by the endpoint lower local dimension exponent. For the b adic Mandelbrot cascade on cubes, the Fourier dimension is the minimum of 2 and the energy exponent, with the universal Fourier barrier at dimension two providing the high dimensional obstruction.

04.
arXiv (CS.CV) 2026-06-15

WAM4D: Fast 4D World Action Model via Spatial Register Tokens

World action models (WAMs) have recently shown promise in jointly modeling future observations and executable robot actions. However, most existing WAMs still operate in 2D video or latent spaces, where visually plausible rollouts miss the 3D spatial constraints and occluded contact geometry required for precise manipulation. While geometric foundation models offer strong priors for recovering dense 3D structure and motion from visual observations, forcing WAMs to predict the dense 4D representation introduces costly geometric decoding and slows down causal action generation. To address the trade-off, we present WAM4D, a fast 4D world action model that uses lightweight spatial register tokens as training-time future-depth readouts to transfer pretrained geometric priors into a causal video-action transformer, then removes the register branch for lightweight action inference. To prevent non-causal shortcuts, we further design causal mixture attention for the Mixture-of-Transformers (MoT) WAM backbone, defining modality-specific visibility among video, action, and geometry tokens. Comprehensive experiments on RoboTwin 2.0 and challenging real-world manipulation tasks show that WAM4D improves spatial consistency and achieves competitive action prediction while maintaining efficient inference.

05.
arXiv (CS.CV) 2026-06-16

Sinkhorn-CPD: Robust point cloud registration via unbalanced entropic optimal transport

Coherent Point Drift (CPD) is widely used for rigid point cloud registration because of its soft correspondences and closed-form parameter updates. However, CPD's target-side marginal constraint forces every observation, including outliers, to receive exactly unit probability mass. This assumption degrades registration accuracy under heavy outliers and partial overlap. Optimal transport (OT) methods can handle missing mass through unbalanced formulations, but require hand-tuned annealing schedules. In this paper, we propose Sinkhorn-CPD, which replaces CPD's target-side marginal constraint with dual Kullback-Leibler penalties, allowing the algorithm to discard outliers on both sides. The resulting formulation is a fully unbalanced entropic optimal transport problem, which can be efficiently solved by generalized Sinkhorn iterations. Moreover, Sinkhorn-CPD preserves the closed-form Procrustes and variance updates of CPD. In our method, the variance sigma^2 plays the role of the entropic regularization parameter, which induces an automatic annealing schedule from diffuse to sharp correspondences without manual temperature tuning. Experiments on synthetic, cross-category, and scan-to-CAD benchmarks show that Sinkhorn-CPD achieves state-of-the-art accuracy, with strong robustness to outliers and partial overlap.

06.
arXiv (CS.CL) 2026-06-11

A Geometric Profile of Semantic Information in Text: Frame-Conditional Uniqueness and a Trade-Off Triangle for Scalar Summaries

How much meaning does a text carry? Shannon's theory measures uncertainty over symbols and is intentionally indifferent to meaning, while pairwise metrics such as BERTScore compare two texts rather than characterizing one. We develop a geometric framework that measures semantic content from the structure of a text's sentence embeddings. The framework has three parts. First, within a fixed embedding and baseline, six natural axioms uniquely determine a scalar measure up to scale, a frame-conditional uniqueness theorem. The resulting scalar is empirically too coarse, motivating a richer representation. Second, we propose a three-coordinate semantic profile capturing novelty (displacement from generic discourse), breadth (diversity of distinct ideas), and integration (connectedness among them), together with a discrete minimal unit (the semantic quantum) whose resolution is fixed by a clustering threshold $\tau$. Third, we prove a no-go theorem: no scalar summary of the profile can simultaneously satisfy analytic stability under paraphrase and concatenation, ordinal robustness across text scales, and cross-representation comparability. We exhibit two practical scalars, $S_{\mathrm{minmax}}$ and $S_{\mathrm{rank}}$, each occupying a distinct corner of this trade-off triangle. Validation across 23 synthetic categories, 5 Project Gutenberg novels, and 3 embedding models confirms the trade-off. The recommended rank-normalized configuration passes 25 of 28 ordinal checks as point estimates (21 of 28 after Benjamini-Hochberg correction), outperforming seven baselines including unigram entropy and a BERTScore-based novelty signal. A separate variational result connects the breadth coordinate to the log-determinant of a determinantal point process (Spearman $\rho = 0.985$ over 507 Gutenberg chapters), giving an optimization-theoretic foundation for breadth.

07.
arXiv (CS.CV) 2026-06-16

Trusted Multi-View Deep Learning Classification of Fetal Congenital Heart Disease with Feature-level and Decision-level Fusion

Congenital heart disease (CHD) refers to the abnormal anatomical structure caused by the abnormal development of the heart and great vessels during embryonic development. Traditional diagnostics often fail to achieve high accuracy and efficiency, especially given the complexity of cardiac anatomy. This study presents a specialized multi-view deep learning framework for CHD binary classification using echocardiographic images. A large-scale CHD dataset, including five views, was used to train the model, enabling it to integrate multi-angle image data. The framework utilizes advanced feature extraction and attention mechanisms to improve diagnostic precision and reliability. An uncertainty-based decision-making component is also integrated to handle low-quality images, enhancing diagnostic outcomes. Experimental results show that this method achieves top-tier performance on our dataset and provides a robust tool for early CHD detection, underscoring its potential for clinical use. The dataset and source code will be released upon paper acceptance.

08.
medRxiv (Medicine) 2026-06-17

Determinants of non-utilization of insecticide-treated nets among children under five in Rwanda: analyses of the 2024 Rwanda malaria indicator survey

Background Insecticide-treated nets (ITNs) are effective for preventing malaria among children under five years, who bear a disproportionate burden of malaria. This study assessed the prevalence and determinants of ITN non-utilization among children under five in Rwanda using data from the 2024 Rwanda Malaria Indicator Survey (RMIS).Methodology This cross-sectional study utilized nationally representative data from the 2024 RMIS. Analyses were restricted to children under five residing in households that owned at least one ITN. The outcome was non-utilization of ITN, defined as not sleeping under an ITN the night preceding the survey. Survey-weighted descriptive statistics were used to estimate the prevalence of ITN non-utilization. Factors associated with non-utilization were identified using a survey-weighted Poisson regression model. Adjusted prevalence ratios (aPRs), 95% confidence intervals and p-values were reported.Results A total of 1,979 children were included in the study. The weighted prevalence of ITN non-utilization among children under five years was 20.11% (95% CI: 17.81 - 22.63). After adjusting for other factors, children aged 2 - 3 years were associated with an 83% higher prevalence of ITN non-utilization compared with those aged [&le;]1 year (aPR = 1.83, 95% CI: 1.423 - 2.352, p < 0.001). Compared with households that owned only one ITN, children in households with three or more ITNs were associated with a 76% lower prevalence of ITN non-utilization (aPR = 0.24, 95% CI: 0.171 - 0.332, p < 0.001). Children living in households with 5 - 7 members were associated with an 87% higher prevalence of ITN non-utilization compared with those in households with 1 - 4 members (aPR = 1.87, 95% CI: 1.476 - 2.358, p < 0.001).Conclusion The findings suggest that ITN utilization among children is influenced not only by household access to nets but also by household composition and dynamics that shape the allocation and use of available preventive resources.

09.
arXiv (quant-ph) 2026-06-19

The use of Peres lattices in periodically driven systems

arXiv:2606.20009v1 Announce Type: new Abstract: We demonstrate the strength of the method of Peres lattices in periodically driven quantum systems. The method, which has previously been used mostly in stationary systems, enables us to efficiently detect resonances in the driven system, to monitor the onset of chaos, and to recognize critical properties of the Floquet modes. It also allows quick comparisons of the spectra of Floquet modes for various driving Hamiltonians and transparent tests of the iterative approximation techniques based on effective stationary Hamiltonians.

10.
arXiv (CS.CV) 2026-06-17

Phenotyping TPF via Self-Supervised Learning: A Label-Agnostic Framework with Expert Validation

The full potential of artificial intelligence in tibial plateau fracture characterisation remains unrealised, constrained by a fundamental dependency on labelled datasets whose consistency cannot be guaranteed: conventional classification schemes such as Schatzker and AO/OTA suffer from inter-observer variability, causing supervised models to learn human disagreement rather than stable fracture morphology. We design, implement, and validate a label-agnostic framework that eliminates this constraint by learning fracture representations directly from imaging data without observer-assigned labels. A RadImageNet-pretrained ResNet-50 encoder is fine-tuned on 154 cleaned knee radiographs using the SimCLR contrastive objective, preceded by a data cleaning protocol and followed by UMAP dimensionality reduction and k-means clustering to discover four imaging-derived phenotypes. Phenotype validity is assessed through a blinded expert review protocol administered to two independent clinicians. The four phenotypes demonstrate robust stability (bootstrap ARI = 0.319 +/- 0.041), strong internal cohesion (silhouette = 0.511), and coherence ratings of 3-5/5 from both reviewers under blinded conditions; one phenotype was unanimously identified as exhibiting comminution – a high-complexity feature isolated without any supervisory signal. Inter-partition comparison against Schatzker labels yields ARI = 0.013, confirming orthogonality to conventional classification boundaries. Notably, expert reviewers anchored to established classification vocabularies perceived imaging-derived groups as heterogeneous precisely where Schatzker alignment was lowest, suggesting that Schatzker-trained perception and label-agnostic embedding geometry measure orthogonal dimensions. These findings establish label-agnostic SSL phenotyping as a reproducible and clinically interpretable complement to conventional classification.

11.
arXiv (CS.CV) 2026-06-11

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, an MLLM forms a spatial hypothesis from the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesized novel-view video. To enable effective cross-view revisiting, we design a Geometry-to-Video pipeline that renders strategically complementary novel views from predicted 3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving the MLLM's native video interface without architectural modifications. Extensive evaluations on VSI-Bench and STI-Bench demonstrate that ReRe substantially boosts open-source MLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/

12.
medRxiv (Medicine) 2026-06-22

Integration of lung tissue proteomics and genome-wide association data to identify lung cancer susceptibility proteins and potential drug targets

Background: Proteins directly impact disease development and act as drug targets. Therefore, we integrated genomic and lung tissue proteomics data to identify lung cancer susceptibility proteins, elucidating genetic mechanisms and candidate drug targets. Method: We profiled the proteome and genome in non-neoplastic lung tissue from 200 lung cancer patients. Using this data, we constructed genetic models to predict abundance across the proteome in lung tissue. We applied these models to genome-wide association study (GWAS) data from 55,174 lung cancer cases and 1,294,174 controls to evaluate their associations with the risk of lung cancer, overall and by major histological subtypes. Bayesian colocalization and Mendelian randomization (MR) analyses were used to prioritize putative causal proteins, which were cross-referenced with three main drug-protein databases to identify potential therapeutic targets. Results: We identified 29 proteins associated with lung cancer risk at a false discovery rate < 5%, including 25 for overall lung cancer, two (AQP3 and IL18) specifically for adenocarcinoma, and another two (HMGN2 and HLA-DMB) for squamous cell carcinoma. Of them, genes encoding 17 proteins reside at least 2Mb away from any known GWAS risk loci, including 14 for overall lung cancer (HYI, GPX1, GMPPB, DSP, HDDC2, MTCH2, SUOX, JMJD7, PDIA3, IL16, IQGAP1, SULT1A2, ARHGAP27, and TYMP) and three for subtypes (AQP3, IL18, and HMGN2). Among the 12 proteins located within the known risk loci, EPHX2, CLDN18, PSMD5, and CYP2S1 proteins showed an association independent of the proximal GWAS-identified lead variant. Colocalization and/or MR analysis suggested 11 potential causal proteins. Five of these candidate causal proteins (DSP, CLDN18, IQGAP1, IL18 and TYMP) are targeted by nine drugs already approved by the FDA or in phase III trials. Conclusion: Our study identified novel lung cancer susceptibility proteins and potential drug targets, offering valuable insights into lung cancer biology and future translational utilities.

13.
arXiv (CS.AI) 2026-06-11

Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions

arXiv:2509.10303v2 Announce Type: replace-cross Abstract: Online reinforcement learning (RL) approaches have demonstrated strong performance on Job Shop Scheduling (JSP) and Flexible JSP (FJSP) problems by learning scheduling policies through direct interaction with simulated environments. However, these methods often require extensive training interactions, limiting their sample efficiency and practical applicability. Motivated by this challenge, we introduce Conservative Discrete Quantile Actor-Critic (CDQAC), an offline RL algorithm that learns effective scheduling policies directly from static, suboptimal datasets. CDQAC couples a quantile-based critic with delayed policy updates to estimate the return distribution of machine-operation pairs. Extensive experiments on JSP and FJSP benchmarks demonstrate that CDQAC consistently outperforms the data-generating heuristics, surpasses state-of-the-art offline and online RL baselines, and is highly sample efficient, requiring only 1 to 5% of the original dataset to learn high-quality policies. Our analysis suggests that, in scheduling, offline RL performance is governed mainly by state-action coverage rather than the quality of individual trajectories. Scheduling couples a dense reward aligned with the makespan objective with equal-length trajectories across heuristics, enabling effective learning from a broad range of behaviors. Consistent with this observation, datasets generated by a simple random heuristic with broader coverage let it outperform policies trained on datasets produced by stronger heuristics such as Genetic Algorithms.

14.
arXiv (quant-ph) 2026-06-17

Universal features of high-energy scattering of Laguerre-Gaussian states

arXiv:2604.00575v2 Announce Type: replace-cross Abstract: Vortex states of photons, electrons, and other particles are wave packets that carry intrinsic orbital angular momentum (OAM) and exhibit other features unavailable for plane waves. Collisions of high-energy vortex states can become a promising tool for nuclear and particle physics, once experimental challenges are overcome. An extensive literature exists on scattering processes involving vortex states; however, most works rely on assumptions that will be challenging to achieve in experiment. In this work, we initiate a systematic re-analysis of vortex-state scattering processes using paraxial Laguerre-Gaussian (LG) wave packets colliding at a non-zero impact parameter $b$. Since the total final transverse momentum $P_\perp$ is no longer fixed, we focus on how the differential cross section depends on $P_\perp$. We emphasize that non-trivial $P_\perp$-dependent features can originate either from the shape of the LG wave packets or from the dynamics of the scattering process under interest. Here, we focus on the former source and explore in detail these universal kinematic features, while the study of process-specific modifications, along with the novel insights they may bring, is delegated to a future work. Interestingly, the non-zero impact parameter $b$ plays a key role in many $P_\perp$-dependent effects, making it a useful probe of vortex states, not a nuisance factor as often assumed.

15.
arXiv (CS.AI) 2026-06-16

Exploiting Search in Symbolic Numeric Planning with Patterns

arXiv:2606.16329v1 Announce Type: new Abstract: In this paper, we present a procedure for numeric planning based on Symbolic Pattern Planning (SPP). Given a numeric planning problem $\Pi$, a pattern $\prec$ is a sequence of actions used to define a formula encoding the subsequences of $\prec$ executable from a starting state $S$. Cardellini, Giunchiglia, and Maratea (2024a) follow the Planning as Satisfiability approach by defining, at each step $n \ge 0$, a formula $\Pi^\prec_n$ in which $(i)$ the pattern $\prec$ is computed only for $n=0$ in the initial state $I$ of $\Pi$, and then exploited at each step $n$, $(ii)$ the starting state $S$ is set to $I$, and $(iii)$ the set $G$ of goals is required to hold in the last state that can be reached by one of the subsequences of $\prec$ concatenated $n$ times. The procedure begins with $n=0$, terminates as soon as $\Pi^\prec_n$ is satisfiable, and otherwise proceeds by incrementing $n$. In this paper, possibly at each step, $(i)$ we symbolically search for an intermediate state $P$ reachable from $I$, closer to a goal state, $(ii)$ dynamically recompute the pattern $\prec_h$ – to be used in the next step – in $P$, $(iii)$ refine the pattern $\prec_g$ used to reach $P$, and $(iv)$ start the new search from the state $S$ which can be either the initial state $I$ or the last computed intermediate state $P$, exploiting the computed patterns $\prec_g$ and $\prec_h$ to define the pattern $\prec$ to be used in the search. In particular, at each step, we define a formula $\Pi^{\prec}_{S,P}$ encoding the existence of a state $P'$ closer than $P$ to a goal state, with $P'$ reachable from the starting state $S$ when using the pattern $\prec$. We present different techniques for producing such formulas, each corresponding to a different strategy for exploring the search space. We prove their correctness and completeness, the latter under certain conditions.

16.
medRxiv (Medicine) 2026-06-16

The Target48 Neurodegeneration Panel: A Novel Tool for Profiling Protein Signatures in Neurodegenerative Disorders

Introduction: Novel tools for absolute quantification of established and emerging fluid neuro-biomarkers are required to advance diagnostic studies and improve biological insights. Methods: We conducted an extensive analytical and clinical validation of the Olink Target 48 Neurodegeneration panel (T48 Neuropanel) in 352 paired CSF and plasma samples from cognitively unimpaired controls (CU), Alzheimer dementia (AD), frontotemporal dementia (FTD), and dementia with Lewy bodies (DLB), n=44 per group. Comparisons with benchmark assays were performed. Results: Good detectability (CSF: 31 out of 42 assays; plasma: 38 out of 42 assays) and technical performance was observed. Benchmark assays showed good correlations, supporting method transformation formulas. Next to emerging biomarkers (MMP10, ITGB2), discriminative performance was excellent in AD: CSF pTau217: AUC=1; FTD: plasma NfL: AUC=0.952; and DLB: CSF DDC: AUC=0.901. Discussion: This analytical and clinical validation of the T48 Neuropanel highlights initial cut-offs and emerging biomarkers to aid clinical studies for the diagnosis, prognosis, and monitoring of neurodegenerative diseases. Highlights: The T48 Neuropanel shows robust analytical performance, with high detectability across both plasma and CSF matrices. The T48 Neuropanel validates established (i.e., pTau217, Abeta42, NfL, and GFAP) and emerging biomarkers (i.e., DDC, MMP10, ITGB2, ITGAM, NPTX2, NPTXR, SMOC1, sTREM1, and sTREM2) in CSF and plasma. CSF NfL, GFAP, ITGB2, and ITGAM and plasma GFAP were dysregulated across AD, FTD, and DLB dementias. -The multiplex design of the T48 Neuropanel enables rich biological interpretation by simultaneously quantifying established and emerging neurodegeneration biomarkers. Importantly, the inclusion of absolute quantification facilitates the establishment of cut-offs, supporting its potential for clinical translation.

17.
arXiv (CS.AI) 2026-06-16

MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition

arXiv:2512.11682v2 Announce Type: replace Abstract: Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, and pharmacological agents. Tasks such as drug recommendation, treatment planning, and adverse-effect prediction demand robust, multi-step reasoning grounded in reliable biomedical knowledge. Agentic AI methods, exemplified by TxAgent, address these challenges through iterative retrieval-augmented generation (RAG). TxAgent employs a fine-tuned Llama-3.1-8B model that dynamically generates and executes function calls to a unified biomedical tool suite (ToolUniverse), integrating FDA Drug API, OpenTargets, and Monarch resources to ensure access to current therapeutic information. In contrast to general-purpose RAG systems, medical applications impose stringent safety constraints, rendering the accuracy of both the reasoning trace and the sequence of tool invocations critical. These considerations motivate evaluation protocols treating token-level reasoning and tool-usage behaviors as explicit supervision signals. This work presents insights derived from our participation in the CURE-Bench NeurIPS 2025 Challenge, which benchmarks therapeutic-reasoning systems using metrics that assess correctness, tool utilization, and reasoning quality. We analyze how retrieval quality for function (tool) calls influences overall model performance and demonstrate performance gains achieved through improved tool-retrieval strategies. Our work was awarded the Excellence Award in Open Science. Complete information can be found at https://curebench.ai/.

18.
arXiv (CS.CL) 2026-06-16

PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization

Motivation: Patient-generated text contains critical information on patients' lived experiences, social context, and care engagement, but remains largely unstructured, limiting its use in patient-centered outcomes research. Prior work introduced the PV-Miner benchmark and PVMinerLLM models for structured extraction. However, supervised fine-tuning (SFT) alone struggles with rare, fine-grained, and unevenly distributed errors, particularly in token-critical structured outputs. Results: We present PVminerLLM2, an improved set of LLMs for structured patient voice extraction that applies preference optimization to address token-critical errors beyond the reach of supervised fine-tuning. Our method introduces (i) a preference objective with token-level gated stabilization term that prevents degradation of absolute token likelihood under preference optimization, and (ii) confusion-aware preference pair construction to better capture low-separation distinctions. We further incorporate token-importance weighting and inverse-frequency reweighing to address token imbalance and class skew. Across multiple model sizes, PVMinerLLM2 consistently outperforms strong baselines, achieving gains of up to 4.43% (Code), 3.50% (Sub-code), and 1.55% (Span), and outperforms baseline LLM trained with existing preference optimization methods. Availability and Implementation: The supplementary material, code, evaluation scripts, and trained models for PVminerLLM2 are publicly available at: https://github.com/Data-Mining-Lab-Yale/PVminerLLM2

19.
arXiv (CS.AI) 2026-06-17

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

arXiv:2606.17996v1 Announce Type: cross Abstract: Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the influence of real-world inter-channel correlations in time series data which leads to suboptimal predictions. Furthermore, these models rely on complex designs to capture diverse information so that resulting in low computational efficiency. To address this challenge, we propose McWC, a long-term time series forecasting model that separately models the cyclicity, trend, and inter-channel correlations. Specifically, McWC first decouples cyclical information from data using a multi-layer cyclicity construction module. Then, it extracts inter-channel correlations using multi-layer perceptron. Next, it models and fuses the multi-layer high-frequency and low-frequency information from data using a multi-level wavelet decomposition module. Finally, it aggregates the results of different components to obtain the output. Simultaneously, we decouple intra-channel autocorrelations by calculating a loss function in the frequency domain. Experiments on six real-world datasets demonstrate that McWC achieves state-of-the-art performance, exhibiting excellent computational efficiency and historical information extraction capabilities.

20.
arXiv (CS.CV) 2026-06-11

Vision Transformers for Face Recognition Need More Registers

Recent advances in Vision Transformers (ViTs) for face recognition (FR) have moved beyond the standard CLS-token paradigm. In this paradigm, a special classification token (CLS) is prepended to the patch embeddings and used as a representation of the input for downstream tasks. An alternative approach, Concatenated Patch Embeddings (CPE), instead leverages all patch tokens by concatenating them into a single vector, which is then projected into a compact face representation. CPE has been shown to improve recognition performance in comparison to CLS-based ones, but our qualitative analysis of attention maps showed the presence of artifacts that limit their interpretability. To address this issue, we incorporate register tokens, learnable tokens concatenated to the initial patch embeddings, and processed jointly through the ViT encoder blocks. This mechanism has been shown to produce more structured and interpretable attention maps compared to baseline ViT. We empirically demonstrate that these artifacts consistently appear across various ViT backbones, including small and large models, and that introducing register tokens effectively mitigates them. Adding four or eight registers significantly enhances interpretability, with eight registers providing the highest verification accuracies and smoothest attention structures. Our resulting model, ViT-8R, corresponds to a CPE-based ViT-B architecture augmented with eight register tokens achieves state-of-the-art performance among ViT-based FR models on large-scale IJB-B and IJB-C benchmarks. Also, ViT-8R produces substantially clearer attention maps compared with the baseline model, which offer deeper insight into the model's attention behavior (https://github.com/TaharChettaoui/ViT-FR-Registers)

21.
arXiv (CS.CV) 2026-06-15

BoRAD: Bootstrap your Own Representations for Multi-class Anomaly Detection

Reconstruction-based anomaly detection is attractive for industrial inspection, but scaling it from category-specific training to a one-for-all setting is challenging. A single model must reconstruct diverse normal appearances without copying abnormal details, which exposes two coupled failure modes: identical shortcut, where anomalies pass through the reconstruction path, and mis-reconstruction, where normal categories are confused with one another. We propose BoRAD, a label-free training framework that treats this as a representation-capacity allocation problem. BoRAD uses a shared learnable prototype bank to impose two complementary regularizers: spatial prototype alignment contracts local within-prototype variation to suppress anomaly copying, while prototype-relative global alignment preserves between-prototype structure and improves sensitivity to abnormal angular deviations. The prototype bank and prediction heads are used only during training; inference remains a standard teacher-student feature discrepancy pass, with no class labels, negative pairs, memory retrieval, or prototype lookup. BoRAD achieves competitive one-for-all anomaly detection performance, including 86.2\% mAD on MVTec AD, 80.7\% mAD on VisA and 73.1\% mAD on Real-IAD. Diagnostic analyses further show reduced anomaly leakage, improved normal-category separability, and stronger anomaly-normal score separation.

22.
arXiv (CS.CL) 2026-06-19

Vero: An Open RL Recipe for General Visual Reasoning

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) suggest that broad visual reasoning is within reach, yet their closed data and reinforcement learning (RL) pipelines make their gains difficult to study, reproduce, or extend. We introduce Vero, a family of fully open VLMs that match or exceed existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answers. Across VeroEval, our 30-benchmark suite, Vero-600K outperforms existing RL datasets under controlled comparisons. Applied to five starting models, Vero variants gain 2.9-5.4 points on average over their initial models. Notably, Vero-Qwen3I-8B, trained on the Instruct model, surpasses Qwen3-VL-8B-Thinking by 3.8 points on average without additional distillation. Systematic ablations reveal that different task categories elicit distinct reasoning patterns and that broad gains depend on learning them jointly rather than in isolation. All data, code, and models are publicly available.

23.
arXiv (CS.CV) 2026-06-15

3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

Reinforcement Learning with Verifiable Rewards ( RLVR ) has emerged as a transformative paradigm for enhancing the reasoning capabilities of Large Language Models ( LLMs), yet its potential in 3D scene understanding remains under-explored. Existing approaches largely rely on Supervised Fine-Tuning ( SFT), where the token-level cross-entropy loss acts as an indirect proxy for optimization, leading to a misalignment between training objectives and task performances. To bridge this gap, we present Reinforcement Fine-Tuning for Video-based 3D Scene Understanding (3D-RFT ), the first framework to extend RLVR to video-based 3D perception and reasoning. 3D-RFT shifts the paradigm by directly optimizing the model towards evaluation metrics. 3D-RFT first activates 3D-aware Multi-modal Large Language Models ( MLLM s) via SFT, followed by reinforcement fine-tuning using Group Relative Policy Optimization ( GRPO) with strictly verifiable reward functions. We design task-specific reward functions directly from metrics like 3D IoU and F1-Score to provide more effective signals to guide model training. Extensive experiments demonstrate that 3D-RFT-4B achieves state-of-the-art performance on various video-based 3D scene understanding tasks. Notably, 3D-RFT-4B significantly outperforms larger models (e.g., VG LLM-8B) on 3D video detection, 3D visual grounding, and spatial reasoning benchmarks. We further reveal good properties of 3D-RFT such as robust efficacy, and valuable insights into training strategies and data impact. We hope 3D-RFT can serve as a robust and promising paradigm for future development of 3D scene understanding.

24.
arXiv (CS.CL) 2026-06-17

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling

Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on syllable-like units. However, methods like Sylber and SyllableLM rely on intricate multi-stage training pipelines. We propose ZeroSyl, a simple training-free method to extract syllable boundaries and embeddings directly from a frozen WavLM model. Using L2 norms of features in WavLM's intermediate layers, ZeroSyl achieves competitive syllable segmentation performance. The resulting segments are mean-pooled, discretized using K-means, and used to train a language model. ZeroSyl outperforms prior syllabic tokenizers across lexical, syntactic, and narrative benchmarks. Scaling experiments show that while finer-grained units are beneficial for lexical tasks, our discovered syllabic units exhibit better scaling behavior for syntactic modeling.

25.
arXiv (CS.LG) 2026-06-16

Beyond Defensive Reporting: Machine Learning for Active Anti-Money Laundering Control in Insurance

arXiv:2606.16663v1 Announce Type: new Abstract: Money laundering through insurance claims poses a threat to insurers both through fraudulent payouts and reputational and regulatory risk. Despite this, little research has examined how such laundering can be prevented. This paper examines whether machine learning can help insurers flag suspicious claims before payout, shifting the focus from passive reporting to active prevention. Using production data from a major Norwegian insurer, we train gradient-boosted decision tree models to detect claims later reported to authorities for suspected money laundering. Because fraud and laundering may share behavioural patterns, we also examine whether insurance fraud labels can serve as an auxiliary training signal. We compare different learning setups using the Budget-Weighted Capture Rate, a metric introduced in this paper to measure how many laundering cases are captured when only a small share of claims can be manually reviewed. The results show that incorporating fraud-related investigation labels substantially improves laundering detection. The best-performing model captures nearly two-thirds of laundering cases within the top-ranked 2 to 6 percent of claims selected for investigation. To our knowledge, this is the first empirical study of machine learning for money laundering detection in insurance claims.