Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-17

Characterizing Nash Equilibria in Zero-Sum Games: A Physics-Inspired, Parallelizable Approach with a Linear Number of Gradient Queries

arXiv:2507.11366v2 Announce Type: replace-cross Abstract: We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating gradient descent in the unbounded setting, modulo degeneracy, a first in online optimization. Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory. Experimentally, we support our results by showing our approach drastically outperforms standard methods.

02.
arXiv (CS.CV) 2026-06-11

An Electric Potential-Augmented Benchmark Dataset for Physics-Guided Image Reconstruction of Electrical Capacitance Tomography

While deep learning has significantly advanced image reconstruction of Electrical Capacitance Tomography (ECT), most data-driven methods map directly between capacitance and permittivity distribution, treating the sensor as a black box. This overlooks the electric potential field – the fundamental physical link governing the nonlinear and ill-posed ``soft-field'' effect. To address this, we propose an electric potential-augmented ECT benchmark dataset designed to explicitly integrate latent physics behind ECT into the learning process. Generated via a COMSOL-MATLAB pipeline for an eight-electrode sensor as an example, the dataset comprises 20,000 randomized samples across four typical flow patterns. Crucially, alongside the conventional capacitance vectors and permittivity distributions depicted as images, each sample preserves eight excitation-wise full-field potential maps. Beyond data release, we provide illustrative evaluation protocols for both forward and inverse problems of ECT. Through comprehensive testing on both in-distribution (IID) and out-of-distribution (OOD) scenarios, we systematically demonstrate how the inclusion of electric potential maps enhances modeling accuracy and robustness. Fundamentally, the explicit inclusion of latent field information significantly lowers the barrier to integrating physical laws into ECT modeling, thereby establishing a standardized foundation for future physics-guided machine learning of ECT image reconstruction.

03.
arXiv (CS.LG) 2026-06-15

Neural ARFIMA model for forecasting BRIC exchange rates with long memory

arXiv:2509.06697v3 Announce Type: replace-cross Abstract: Exchange rate forecasting remains a challenging problem, particularly for emerging economies, where the observed time series exhibit pronounced long-memory dependence, nonlinear dynamics, and sensitivity to macro-financial drivers. Classical models such as ARFIMA capture long-range persistence but fail to adequately represent nonlinear relationships, while modern machine learning approaches often neglect the underlying long-memory structure in macroeconomic series. To address this gap, we propose a Neural AutoRegressive Fractionally Integrated Moving Average (NARFIMA) model that integrates ARFIMA-based long-memory modeling with neural networks for nonlinear function approximation, while incorporating exogenous macroeconomic and uncertainty indicators. The framework provides a unified approach for capturing persistence, nonlinear dynamics, and external shocks. We establish asymptotic stationarity of the NARFIMA process and develop conformal prediction intervals for distribution-free uncertainty quantification. Empirical results for BRIC exchange rates show that NARFIMA consistently outperforms a broad range of forecasting benchmarks across multiple horizons, underscoring the importance of explicitly modeling long-memory dependence in exchange rate dynamics. The `narfima' R package provides an implementation of our approach.

04.
arXiv (CS.LG) 2026-06-19

Structure-Oriented Randomized Neural Networks for Poisson-Nernst-Planck and Poisson-Nernst-Planck-Navier-Stokes Systems

arXiv:2606.19912v1 Announce Type: cross Abstract: We develop a structure-oriented randomized neural network framework, termed SO-RaNN, for the Poisson-Nernst-Planck (PNP) system and the Poisson-Nernst-Planck-Navier-Stokes (PNP-NS) system. The decoupled linearized subproblems are solved iteratively by randomized neural networks in a space-time framework. For the concentration variables, a pointwise cut-off is used to enforce positivity at the value level, and discrete mass-scaling factors are computed at selected correction instants and interpolated in time, so as to ensure exact mass matching at those instants and to promote approximate mass preservation between them. To introduce an auxiliary discrete dissipation mechanism, we further employ an SAV-type post-processing correction, which yields monotonicity of the SAV auxiliary variable under the ideal SAV update. For the PNP-NS system, a structure-preserving randomized neural network (SP-RaNN) is used for the velocity field, so that the velocity approximation satisfies the incompressibility constraint pointwise by construction. On the theoretical side, we derive residual-based estimates for the raw, uncorrected RaNN solvers of the linearized subproblems, formulate a conditional local-in-time convergence result for the raw outer Picard iteration of the PNP system, and analyze the value-level positivity correction together with the mass-correction and SAV post-processing steps. For the PNP-NS system, we establish an approximation result for the SP-RaNN space and provide a conditional error statement for the corresponding linearized Oseen-type problem. Numerical experiments demonstrate approximation accuracy in the source-driven manufactured tests and illustrate the intended value-level positivity correction, selected-time mass matching, computed free-energy curves based on the final gauge-fixed potential, and divergence-free approximation in benchmark tests.

05.
arXiv (CS.AI) 2026-06-19

AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models

arXiv:2509.25148v2 Announce Type: replace Abstract: Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose AAPA (Adversarially Anchored Preference Alignment), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.

06.
medRxiv (Medicine) 2026-06-23

Respiratory support with Continuous Positive Airway Pressure in preterm neonates: an analysis of coverage and quality of care in 66 neonatal units in Kenya, Malawi, Nigeria and Tanzania implementing with the NEST360 Alliance

Background: Prematurity is the leading cause of child deaths worldwide, with the highest neonatal mortality in sub Saharan Africa. Respiratory distress syndrome (RDS) is the leading mortality pathway in preterm neonates, but continuous positive airway pressure (CPAP) has high impact. This analysis reports CPAP coverage and quality of care for preterm neonates admitted to 66 neonatal units in Kenya, Malawi, Nigeria and Tanzania. Methods: Analyses used individually linked neonatal inpatient data and cross-sectional health systems data. All admitted neonates were eligible for inclusion (January 2021 through December 2024). Service readiness for CPAP delivery and mean CPAP coverage were described for CPAP eligible newborns (weighing 1500g). Quality of care cascades were constructed to illustrate key indicators. Survival among CPAP eligible neonates was analysed using regression models, stratified by clinical severity scores. Results: 375,255 newborn admissions were analysed in 66 neonatal units. Functional CPAP availability varied with median 16% of days (IQR: 4 to 47%) classified as high demand (>1.5 eligible newborns per CPAP). Of 64,761 CPAP eligible neonates, 22,006 (34%, 95% CI 33 to 34%) received CPAP. All countries showed improvement in CPAP coverage, with Tanzanian hospitals recording 63% increase in mean coverage (p-value=0.001) over time. Quality of care cascades showed treatment was initiated 1 day for 42% (95% CI 41 to 43%) of eligible neonates receiving CPAP. Only 10% of neonates

07.
medRxiv (Medicine) 2026-06-11

Population-scale detection of methylation outliers from long-read genome sequencing

Background: Aberrant DNA methylation can mediate the functional effects of rare genetic variation and contribute to imprinting disorders, repeat expansion diseases, and other pathogenic regulatory mechanisms. Long-read sequencing technologies now enable genome-wide detection of CpG methylation alongside genetic variation from a single assay. However, methods for systematic identification and interpretation of methylation outliers from long-read sequencing data remain limited. Methods: We developed METAFORA, a computational workflow for detecting methylation outlier regions from PacBio and Oxford Nanopore long-read sequencing data. METAFORA constructs population-level methylation references, segments the genome into correlated CpG blocks, infers technical and biological sources of variation through hidden factor estimation, models uncertainty due to variable depth sequencing, and computes covariate-adjusted methylation outlier scores for individual samples. We applied METAFORA across large long-read sequencing cohorts and integrated methylation outliers with multi-omic data. METAFORA is implemented as a snakemake workflow available at https://github.com/tjense25/METAFORA. Results: METAFORA identified methylation outlier regions associated with rare structural variants, tandem repeat expansions, and imprinting abnormalities. We found outlier regions were enriched for molecular outliers across transcriptomic and chromatin accessibility datasets, supporting their functional relevance in gene regulation. In a representative case, METAFORA identified an imprinting defect affecting the GNAS locus associated with an STX16 deletion. Conclusions: METAFORA enables scalable detection and interpretation of methylation outliers from long-read sequencing data and provides a framework for integrating epigenetic outliers with genomic and multi-omic analyses. These approaches may improve interpretation of rare regulatory variation and support discovery of clinically relevant epigenetic abnormalities in genomic medicine.

08.
PLOS Medicine 2026-05-08

Climate change and non-communicable diseases: An invisible syndemic

by Gokul Parameswaran, Sadeer Al-Kindi, Sanjay Rajagopalan Climate change accelerates non-communicable diseases (NCDs) through cascading environmental disruptions and is attributed to driving increased NCD-related mortality. Yet this syndemic remains invisible and underfunded. We detail why addressing the climate-NCD intersection is critical for improving health. In this Perspective, Sanjay Rajagopalan and colleagues discusses how climate change accelerates non-communicable diseases (NCDs) and exacerbates NCD-related mortality, and calls for greater visibility and funding to address this syndemic and improve human health.

09.
arXiv (CS.LG) 2026-06-15

Hybrid Uncertainty Sensitivity Analysis Based on the HSIC for High-Dimensional Responses with Aleatory–Epistemic Separation

arXiv:2606.14053v1 Announce Type: cross Abstract: Quantifying the influence of hybrid aleatory and epistemic uncertainties on high-dimensional system responses remains a major challenge in global sensitivity analysis (GSA). Existing Hilbert–Schmidt Independence Criterion (HSIC)-based approaches are primarily restricted to single-output settings and lack a rigorous decomposition of heterogeneous uncertainty sources and their interactions. To address this limitation, a novel double-space tensor-product RKHS framework is proposed for sensitivity analysis under hybrid uncertainty. By constructing factorized kernels over both the latent input space and the multidimensional output space, a concurrent double Möbius inversion is derived to orthogonally decompose the global dependence measure into pure aleatory effects, pure epistemic effects, and their interaction contributions. The resulting dimension-wise sensitivity indices preserve the uncertainty attribution structure across all output dimensions. To satisfy the independence assumptions required by the decomposition, an auxiliary-variable representation based on the inverse probability integral transform is introduced, enabling the treatment of hierarchical uncertainties and Copula-induced correlations within a unified latent space. A fully vectorized single-loop implementation is further developed to avoid the computational burden of nested Monte Carlo simulation. Statistical significance and estimation uncertainty are quantified through permutation testing and Bootstrap confidence intervals. Numerical studies on a modified multi-output Ishigami function and an aerodynamic pressure-field problem demonstrate the accuracy, scalability, and practical applicability of the proposed framework.

10.
arXiv (math.PR) 2026-06-16

BBP Phase Transition for a Doubly Sparse Deformed Model

arXiv:2603.04832v3 Announce Type: replace Abstract: We prove the equivalent of the Baik, Ben Arous, Péché (2004) phenomenon for a novel, doubly sparse model where both the Wigner noise matrix and signal vector(s) are sparse. Specifically, we consider a deformed sub-Gaussian sparse Wigner ensemble with a fixed number of sub-Gaussian spike vectors of the same-order sparsity added. We show that spike vectors with signals greater than one are correlated with the top eigenvectors of the deformed ensemble and that each spike vector of signal greater than one induces an outlier eigenvalue. Notably, our results hold in the supercritical sparsity regime for the Wigner matrix ($q \gg \frac{\log n}{n}$) and for any sparse spike vector with an unbounded number of entries ($np\to \infty$). No further relationship between the sparsities of the noise matrix ($q$) and spike vectors ($p$) is necessary. This generalizes the work of Benaych-Georges and Nadakuditi (2010) and Péché (2005).

11.
arXiv (quant-ph) 2026-06-15

Bandstructure of a coupled BEC-cavity system: effects of dissipation and geometry

arXiv:2504.17730v2 Announce Type: replace-cross Abstract: We present a theoretical model for a transversally driven Bose-Einstein condensate coupled to an optical cavity. We focus on the interplay between different coherent couplings, which can trigger a structural phase transition, known as the superradiant phase transition. Our approach, based on band structure theory and a mean-field description, enables a comprehensive analysis of the nature of the system's excited modes, precursing the phase transitions. By incorporating dissipative couplings, intrinsic to these systems, we find non-Hermitian phenomena such as the coalescence of crossing precursor modes and the emergence of exceptional points (EPs). The general formulation of our model allows us to explain the role of an angle between transverse pump and the cavity deviating from $90^\circ$. This offers us a unified perspective on the plethora of different implementations of such systems.

12.
arXiv (CS.AI) 2026-06-11

Towards Deep Learning Surrogate for the Forward Problem in Electrocardiology: A Scalable Alternative to Physics-Based Models

arXiv:2512.13765v2 Announce Type: replace-cross Abstract: The forward problem in electrocardiology, computing body surface potentials from cardiac electrical activity, is traditionally solved using physics-based models such as the bidomain or monodomain equations. While accurate, these approaches are computationally expensive, limiting their use in real-time and large-scale clinical applications. We propose a proof-of-concept deep learning (DL) framework as an efficient surrogate for forward solvers. The model adopts a time-dependent, attention-based sequence-to-sequence architecture to predict electrocardiogram (ECG) signals from cardiac voltage propagation maps. A hybrid loss combining Huber loss with a spectral entropy term was introduced to preserve both temporal and frequency-domain fidelity. Using 2D tissue simulations incorporating healthy, fibrotic, and gap junction-remodelled conditions, the model achieved high accuracy (mean $R^2 = 0.99 \pm 0.01$). Ablation studies confirmed the contributions of convolutional encoders, time-aware attention, and spectral entropy loss. These findings highlight DL as a scalable, cost-effective alternative to physics-based solvers, with potential for clinical and digital twin applications.

13.
arXiv (CS.CV) 2026-06-12

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenarios, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that 3D spatial reasoning tasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than selecting tools according to the specific scene and task. To address this, we propose Skill-3D, a framework that learns self-evolving scene-aware skills. Specifically, Skill-3D identifies the task scene and records the agent's tool-use trajectory into a Scene Memory, where successful trajectories from similar scenes are aggregated and distilled into a reusable scene-aware skill, with failed ones attached to the skill as lessons. During training, once a similar scene recurs, the corresponding skill is injected to guide the agent, producing new trajectories whose successes and failures further refine the skill, forming a loop in which the memory and the skill library co-evolve. Experiments show that Skill-3D substantially improves tool utilization in 3D spatial reasoning (from 39% to 78% on VSI-Bench), driving the agent toward correct and sufficient tool use. For instance, it improves Gemini-3-Flash by 67% on MMSI-Bench. Furthermore, we conduct agentic post-training over skill-guided trajectories, which boosts Qwen3-VL-8B by 60% on VSI-Bench.

14.
arXiv (quant-ph) 2026-06-11

A semi-definite programming formulation of the device-dependent guessing probability

arXiv:2606.12079v1 Announce Type: new Abstract: In quantum mechanics, a measurement applied to a state in general produces some amount of intrinsic randomness. This is not only a fundamental feature of the theory, but is also at the basis of any quantum process to generate random numbers. The simplest of such processes consists of a single, fully charaterized, measurement acting on a single, fully characterized, state. Unfortunately, no general method to estimate the intrinsic randomness produced in such setups is known. In this work, we address this issue by presenting a semidefinite programming formulation of the maximum probability with which an adversary, Eve, can guess the outcomes of characterized but untrusted prepare-and-measure setups. We then present several applications of this construction. First, we apply our method to a variety of specific setups, allowing us both to benchmark the approach and, more importantly, to determine the exact amount of certifiable randomness in scenarios where only upper bounds were previously available. Then, we show that the presence of entanglement between the device preparing the state and the measurement strictly increases Eve's predictive power, already in the most elementary setup of a binary measurement acting on a qubit state.

15.
arXiv (CS.CL) 2026-06-16

EffGen: Enabling Small Language Models as Capable Autonomous Agents

Most existing language model agentic systems today are built and optimized for large language models (e.g., GPT, Claude, Gemini) via API calls; while powerful, this approach faces several limitations including high token costs and privacy concerns for sensitive applications. We introduce EffGen, an open-source agentic framework optimized for small language models (SLMs) that enables effective, efficient, and secure local deployment. EffGen makes four major contributions: (1) Enhanced tool-calling with prompt optimization that compresses input prompts by up to 70-80% (and 57% on average across our benchmarks) while preserving task semantics, (2) Intelligent task decomposition that breaks complex queries into parallel or sequential subtasks based on dependencies, (3) Complexity-based routing using five factors to make smart pre-execution decisions, and (4) Unified memory system combining short-term, long-term, and vector-based storage. Additionally, EffGen unifies multiple agent protocols (MCP, A2A, ACP) for cross-protocol communication. Results on 13 benchmarks show EffGen outperforms LangChain, AutoGen, and Smolagents with higher success rates, faster execution, and lower memory. Our results reveal that prompt optimization and complexity routing have complementary scaling behavior: optimization benefits SLMs more (11.2% gain at 1.5B vs 2.4% at 32B), while routing benefits large models more (3.6% at 1.5B vs 7.9% at 32B), providing consistent gains across all scales when combined. EffGen is released under the Apache 2.0 License, ensuring broad accessibility for research and commercial use, with the code available at https://github.com/ctrl-gaurav/effGen, the Python package at https://pypi.org/project/effgen/ (pip install effgen), and the project website and documentation at https://effgen.org/ and https://docs.effgen.org/.

16.
arXiv (CS.LG) 2026-06-24

Aligning Audio Captions with Human Preferences

arXiv:2509.14659v3 Announce Type: replace-cross Abstract: Current audio captioning relies on supervised learning with paired audio-caption data, which is costly to curate and may not reflect human preferences in real-world scenarios. To address this, we propose a preference-aligned audio captioning framework based on Reinforcement Learning from Human Feedback (RLHF). To capture nuanced preferences, we train a Contrastive Language-Audio Pretraining (CLAP) based reward model using human-labeled pairwise preference data. This reward model is integrated into an RL framework to fine-tune any baseline captioning system without ground-truth annotations. Extensive human evaluations across multiple datasets show that our method produces captions preferred over baseline models, particularly when baselines fail to provide correct and natural captions. Furthermore, our framework achieves performance comparable to supervised approaches with ground-truth data, demonstrating effective alignment with human preferences and scalability in real-world use.

17.
arXiv (CS.CL) 2026-06-15

Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension

Encoding models provide a powerful framework for linking continuous stimulus features to neural activity; however, traditional voxelwise approaches are limited by measurement noise, inter-subject variability, and redundancy arising from spatially correlated voxels encoding overlapping neural signals. Here, we propose an independent component (IC)-based encoding framework that dissociates stimulus-driven and noise-driven signals in fMRI data. We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features, highlighting the interpretability of identified component time series. Components identified as noise or motion-related artifacts by ICA-AROMA showed uniformly poor predictive performance, confirming that highly predicted components reflect genuine stimulus-related neural signals rather than confounds. Overall, IC-based encoding models enable analyses at the level of functional networks, accommodating the variability in network locations across individuals and providing interpretable results that are easy to compare across subjects. Code provided at: https://github.com/kamyahari/IC-Encoding-Models.git

18.
arXiv (CS.CV) 2026-06-17

Query-Efficient Video Adversarial Attack with Stylized Logo on Service Computing

In service computing, video classification has become fundamental to many intelligent applications. While Deep Neural Networks (DNNs) have demonstrated excellent performance in recognizing video content, recent studies have shown that DNNs are highly vulnerable to adversarial examples. Thus, understanding adversarial attacks can better respond to emergency situations. In order to improve attack performance, many style-transfer-based attacks and patch-based attacks have been proposed. However, the global perturbation of the former will bring unnatural global colors, while the latter is difficult to achieve success in targeted attacks due to the limited perturbation space. Moreover, compared to a plethora of methods targeting image classifiers, video adversarial attacks remain relatively underexplored. Therefore, to generate adversarial examples with a low budget and to provide them with a higher verisimilitude, we propose a novel black-box video attack framework, called Stylized Logo Attack (SLA). SLA is conducted through three stages. The first stage involves building a style reference set for logos, which can not only make the generated examples more natural, but also carry more target class features in targeted attacks. Then, Reinforcement Learning is employed to determine the style reference and position parameters of the logo within the video, which ensures that the stylized logo is placed in the video with optimal attributes. Finally, perturbations are optimized in a step-by-step manner so as to improve the fooling rate. Experimental results indicate that SLA can achieve better performance than state-of-the-art methods and still maintain good deception effects when facing various defense methods. We believe SLA can raise awareness among the security community about the reliability and security of video classification systems and serve as a memorandum of possible attack methods.

19.
arXiv (math.PR) 2026-06-16

On the empirical spectral distribution of matrix perpetuities

arXiv:2605.31054v2 Announce Type: replace Abstract: We study matrix perpetuities, that is, solutions to affine fixed-point equations of the form \[ \mathbf{X} \stackrel{d}{=} \mathbf{A}\,\mathbf{X} \,\mathbf{A}^\top+\mathbf{B},\qquad (\mathbf{A},\mathbf{B})\mbox{ and }\mathbf{X} \mbox{ are independent}, \] with particular emphasis on the empirical spectral distribution of the solution. We first establish existence and uniqueness results by relating the problem to classical vector perpetuities, and then develop tools that preserve the matrix structure under orthogonal invariance. For positive semidefinite, orthogonally invariant models, we obtain power-law tail asymptotics for the expected empirical spectral distribution and show that the tail is governed by the largest eigenvalue. We also prove that, in the subcritical regime, the expected empirical spectral distribution of matrix perpetuities converges weakly, as the dimension tends to infinity, to the distribution of the corresponding free perpetuity. Our results are illustrated by matrix Beta prime perpetuities, for which explicit limiting spectral distributions are available.

20.
arXiv (CS.CL) 2026-06-12

Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models

Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we study token efficiency in TS language modeling from an asymmetric-token perspective. We show that TS tokens have highly uneven spectral contributions, where many tokens share redundant frequency patterns while a small subset preserves critical temporal evidence. We also observe that prompt-token influence attenuates with model depth, suggesting that full prompt retention across all layers is unnecessary. Based on these findings, we develop an adaptive token budgeting framework that compresses TS tokens via frequency-domain structure and progressively reduces prompt tokens across layers. Experiments across forecasting, classification, imputation, and anomaly detection demonstrate up to 7.68$\times$ inference acceleration and performance gains in 78\% of evaluated settings, showing the effectiveness of asymmetric token compression for scalable TS foundation models.

21.
arXiv (CS.CV) 2026-06-18

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to multimodal large language models (MLLMs) can create a shortcut: the privileged target may guide tokens mainly based on the text reference target rather than the image. We propose ViGOS, a visually grounded OPSD framework for MLLM post-training. The student first writes a visual description and then reasons toward the final answer. For valid rollouts, an image-only perception teacher supervises the description, while a privileged reasoning teacher supervises the reasoning and final answer on the same student prefix. A reference teacher is used only for invalid rollouts to recover the output format. Across general vision-language, expert reasoning, visual math, spatial grounding, and visual-language-prior benchmarks, ViGOS keeps the main benefits of OPSD and improves image-grounded behavior in shortcut-prone settings.

22.
arXiv (CS.CL) 2026-06-16

The Answer Lies Within: Self-Derived Rewards Enable Explainable Relation Extraction

Despite the remarkable reasoning capabilities of large language models, they still struggle with one-shot relation extraction without predefined relation labels. We identify two pitfalls: models are often misled by irrelevant tokens instead of relation-conveying semantics, and they often fail to align with the abstraction level human annotators expect. We introduce a novel framework that closes this gap with two components: (1) COGRE, a cognitively-inspired reasoning framework that structures RE into a series of processes mimicking human text-processing; and (2) HIT@DICT, a reinforcement learning intermediate reward strategy that encourages reasoning to align with relational labels by rewarding relation-relevant phrases in reasoning. The reward is derived on a credit dictionary automatically extracted from correct predictions. Our experiments show that our framework improves both accuracy and explanation quality by addressing these two pitfalls. For example, COGRE with Qwen2.5-14B-Instruct on One-shot NYT29 achieves 24.65% F1, surpassing prior reasoning-based designs. Optimizing this approach with RL using HIT@DICT further improves performance by +23.46% points. Finally, human evaluation shows that our best model generates relational phrases closely aligned with gold labels, increasing human explanation quality ratings by 54% (relative).

23.
arXiv (CS.CV) 2026-06-15

Catching magnetic resonance imaging outliers in artificial intelligence-supported radiotherapy workflows: unsupervised detection and localization of image anomalies using deep learning

Artificial intelligence is increasingly integrated into radiotherapy workflows, yet such pipelines remain vulnerable to out-of-distribution image data that may introduce unexpected behavior in clinical tasks. Deep learning-based anomaly detection for pelvic magnetic resonance imaging (MRI) remains largely unexplored, and transparent evaluation of its feasibility for full automation is limited. We developed and evaluated a fully automated, unsupervised anomaly-detection framework for pelvic and brain MRI. A two-stage framework was trained on reference images from public datasets: LUND-PROBE for pelvic MRI, and IXI, fastMRI, and fastMRI+ for brain MRI. In the first stage, MRI slices were compressed into discrete tokens; in the second, the distribution of normal tokens was modeled. Anomaly evidence was estimated by combining perceptual image differences with token-surprisal scores based on negative log-likelihood. Automated detection was evaluated on pelvic MRI with synthetic global and real clinical anomalies, and on brain MRI with clinically annotated fastMRI+ abnormalities. Sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and false-positive behavior in held-out normal cases were assessed. The framework achieved robust detection across hidden evaluation cohorts, with AUCs of 0.97 (95% CI, 0.95-0.98) and 0.81 (95% CI, 0.74-0.87) for pelvic and brain MRI, respectively. Heatmap analysis showed strong spatial agreement between detected anomalies and ground-truth locations, supporting localization accuracy and interpretability. These results support the potential of unsupervised anomaly detection as an automated MRI quality-control layer for radiotherapy workflows, with transparent visualization of image regions likely to compromise downstream AI-based tasks.

24.
arXiv (quant-ph) 2026-06-17

Optimizing bias-tailored quantum error correction beyond code-capacity noise

arXiv:2606.17709v1 Announce Type: new Abstract: We find that the substantial advantages predicted for bias-tailored quantum error correction (QEC) under code-capacity noise are strongly reduced once realistic syndrome extraction and circuit-level noise models are considered. We start by comparing XZZX codes to rectangular surface codes with a bias-dependent optimised anisotropy. Although code-capacity simulations predict an advantage of rectangular surface codes in the limit of high noise bias, this actually disappears under circuit-level noise, making the XZZX codes the preferred and simplest choice even for platforms that allow for a flexible variation of the code layout adapted to changes in noise calibration. Our results identify bias degradation during syndrome extraction under circuit-level noise as the central limitation of biased-tailored QEC. To partially mitigate this effect, we introduce a bias-filtering CNOT gadget that temporarily encodes the ancillary target qubit during syndrome extraction in a repetition code and, upon measurement and feed forward, manages to reduce the bias degradation. In a regime of high-bias and low-idle errors, this bias-filtering gadget yields a few-percent relative improvement of the XZZX code error threshold, demonstrating that lightweight bias-filtering strategies can recover part of the lost bias-tailoring advantage for realistic circuit-level noise.

25.
arXiv (CS.CL) 2026-06-18

Sumi: Open Uniform Diffusion Language Model from Scratch

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large token budget. Both autoregressive modeling and masked diffusion modeling already have capable models at scale that the community can study and build on; uniform diffusion has none. A scratch-pretrained UDLM at scale would provide a clean reference point for studying scaling behavior, generation dynamics, controllability, and trade-offs against established autoregressive and masked diffusion models. To this end, we introduce Sumi ("ink" in Japanese), a fully open 7B uniform diffusion language model pretrained from scratch on 1.5T tokens. Sumi performs competitively with autoregressive models trained at comparable token budgets on knowledge, reasoning, and coding benchmarks, while under-performing on commonsense benchmarks, where our education-heavy data mixture is a likely contributor. We release our model weights, checkpoints, and full training recipe, including a complete specification of the data mixture over publicly available corpora. We hope this release enables the community to study native uniform diffusion at scale and catalyzes work on its as-yet poorly understood aspects.