Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-24

MATCH: Flow Matching for Multi-View Anomaly Detection

Detecting anomalies in industrial objects is an important topic for increasing production efficiency. More complex objects often require the analysis of several view points, which has led to the field of multi-view anomaly detection. We present MATCH, the first multi-view anomaly detection method based on Flow Matching (FM). With the ODE formulation of Flow Matching, we can estimate likelihoods and thereby derive an anomaly score to detect anomalies in multi-view image data at object, image, and pixel-level. The architectural flexibility of FM models allows us to efficiently transform features of different spatial sizes to the normal distribution. We evaluate thoroughly on the already established Real-IAD data set and are also the first to provide a comprehensive evaluation of popular anomaly detection methods for the MANTA-Tiny data set. MATCH achieves state-of-the-art performance in both anomaly detection and segmentation, all while running on consumer-level hardware. By omitting the costly divergence term needed for likelihood estimation, we ensure that MATCH is usable in real-time production scenarios. Lastly, several ablation studies are conducted to validate the methodological choices.

02.
arXiv (quant-ph) 2026-06-25

Asymmetry dynamics and nonequilibrium symmetry-breaking phase transitions

arXiv:2606.07188v2 Announce Type: replace-cross Abstract: In classical settings, the Mpemba effect occurs when a hotter system cools faster than an initially colder one. In quantum systems, this effect can be reinterpreted exploiting the concept of symmetries, with the asymmetry of a subsystem playing the role of temperature. A quantum Mpemba effect arises when a more asymmetric state restores the symmetry faster than a less asymmetric one. Previous work mainly focuses on closed systems characterized by thermal equilibration and Hamiltonian symmetries. In this paper, we analyze the dynamics of asymmetry in an open quantum many-body system featuring symmetry breaking and uncover dynamical behavior that appears to be unique to these settings. In the symmetric phase, we demonstrate the existence of a quantum Mpemba effect, which emerges as a direct consequence of a non-monotonic evolution of the asymmetry. In the broken-symmetry phase, we analyze the imbalance between the system's ability to increase or to decrease its asymmetry. Our results extend the notion of quantum Mpemba effects to open quantum many-body systems exhibiting symmetry-breaking phase transitions and establish them as a platform for observing and controlling anomalous relaxation phenomena.

03.
arXiv (CS.CV) 2026-06-11

Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations

Explanation mechanisms are increasingly used to support transparency and trust in vision-language models (VLMs), particularly in settings where model decisions require human oversight. However, the robustness of these explanations remains insufficiently understood. In this work, we investigate whether explanation heatmaps in VLMs, particularly CLIP-based models, faithfully reflect model reasoning under adversarial conditions. We show that explanation maps can be systematically manipulated while preserving the model's original prediction, revealing a disconnect between predictive behavior and explanation faithfulness. To study this vulnerability, we introduce X-Shift, a novel grey-box attack that perturbs patch-level visual representations to redirect explanation heatmaps toward semantically irrelevant regions without altering the predicted output. Unlike conventional adversarial attacks that aim to induce misclassification, X-Shift specifically targets the integrity of the explanation process itself. The attack operates without modifying model parameters and generalizes across multiple CLIP architectures and explanation methods. We evaluate the proposed approach on ImageNet-1k, MS-COCO, and Flickr30K, demonstrating consistent degradation in explanation alignment under imperceptible perturbations while maintaining prediction stability. Furthermore, standard prediction-oriented adversarial attacks fail to reproduce the same explanation-shifting behavior even under substantially larger perturbation budgets. Our findings highlight a fundamental limitation of current explanation mechanisms in VLMs and raise concerns about their use as reliable indicators of model trustworthiness in high-impact applications.

04.
arXiv (CS.LG) 2026-06-12

One Transit Is All You Need: Detecting Exoplanets Through Learned Stellar Behaviour with EXOVEIL

arXiv:2606.02778v3 Announce Type: replace-cross Abstract: I present EXOVEIL, a transit detection system that learns what a star's brightness should look like and flags when reality disagrees. Unlike existing systems that require phase-folded input, EXOVEIL operates on raw flux time series and can detect planets that transit only once.A Transformer world model, trained on 16,499 Kepler light curves with transit-masked self-supervised learning, predicts expected stellar flux. A matched-filter detector with variance weighting extracts transit signals from the prediction residuals. A learned classifier (XGBoost) separates planets from false positives, achieving AUC 0.938 on Kepler DR25. Applied to single-transit injection-recovery, EXOVEIL recovers 32% of transits at 1000 ppm depth a task where all classification-based systems score 0% by construction. A blind search of 3,737 Kepler stars yields 179 new transit-like signals not present in the DR25 TCE catalogue, including 46 monotransit candidates. Applied withoutretraining to 47 confirmed TESS planets in the PLATO LOPS2 field, EXOVEIL achieves 100% recovery, demonstrating zero-shot cross-mission transfer. At PLATO's 25-second cadence, detection reaches 100 ppm – approaching the Earth-analog regime. I provide the first application of conformal prediction to transit detection (95.9% empirical coverage) and release the system as pip install exoveil with pretrained weights and a candidate catalogue.

05.
arXiv (CS.CV) 2026-06-24

ZeroGVC: Zero-Shot Generative Video Compression with Autoregressive Diffusion Priors

Recent generative video compression methods leverage powerful generative priors to achieve perceptually pleasing reconstructions. However, most existing approaches require additional training to adapt generative models to produce realistic reconstructions from compact representations. In this paper, we propose ZeroGVC, a zero-shot generative video compression framework that leverages pretrained autoregressive diffusion priors for low-delay video reconstruction. ZeroGVC encodes the first frame of each group of pictures (GOP) with an image codec and represents subsequent P-frames through Codebook-Guided Autoregressive Latent Compression. This design is motivated by our observation that the compression scheme of denoising diffusion codebook models is effective in few-step consistency sampling. By selecting compact combinations of reproducible codebook noise vectors, ZeroGVC steers the latent denoising trajectory toward the target P-frame while allowing the decoder to reproduce the same trajectory in only a few denoising steps. In addition, we design an optional bidirectional reference mode that mitigates error propagation by leveraging the next I-frame context without introducing any additional bitrate overhead. Extensive experiments on standard video compression benchmarks demonstrate that ZeroGVC achieves superior perceptual reconstruction quality at ultra-low bitrates without any additional training.

06.
medRxiv (Medicine) 2026-06-15

Specialty Choice Attitudes Among Medical Interns: Evidence from Hormozgan University of Medical Sciences

Background: Choosing a medical specialty is a critical career decision that affects both physicians future professional lives and the composition of the healthcare workforce. Specialty preferences are shaped by multiple personal, educational, and socioeconomic factors, yet evidence from senior medical students in southern Iran remains limited. This study aimed to assess willingness to pursue specialty training among medical interns at Hormozgan University of Medical Sciences, identify their preferred specialties, and examine factors associated with their decisions. Methods: This descriptive-analytical cross-sectional study was conducted in 2023 among medical interns at Hormozgan University of Medical Sciences in Bandar Abbas, Iran. Using a convenience census approach, all eligible interns were invited to participate, and 83 students completed an online questionnaire. The instrument collected demographic, academic, and occupational data, as well as reasons for willingness or unwillingness to pursue specialty training and specialty preferences. Content and face validity were assessed by faculty members and students, and internal consistency reliability in the present study was acceptable (Cronbach alpha = 0.82). Data were analyzed using descriptive statistics and logistic regression in SPSS version 27. Results: Of the 83 participants, 50 (60.2%) reported willingness to pursue specialty training, while 33 (39.8%) did not. Among students willing to continue, the most frequently cited reasons were achieving a better economic position, broader job opportunities, and higher social status. Among those unwilling to continue, the most common reasons were fatigue from prolonged studying, financial problems, and the desire to start working after graduation. Radiology was the most common first-choice specialty, followed by otorhinolaryngology, dermatology, and cardiology. In regression analyses, no demographic or academic variable remained independently associated with willingness to pursue specialty training in the final multivariable model. Conclusions: A majority of medical interns were interested in pursuing specialty training, with preferences concentrated in a limited number of specialties perceived as offering favorable financial prospects, prestige, and lifestyle. Economic concerns and educational fatigue were the dominant factors influencing willingness and unwillingness to continue specialty education. These findings highlight the need for structured career counseling, broader exposure to different specialties, and policy measures to address financial and structural barriers to residency training. Keywords: medical specialty choice; medical interns; residency training; medical education; Hormozgan university of medical sciences

07.
arXiv (CS.AI) 2026-06-18

An In-depth Study of LLM Contributions to the Bin Packing Problem

arXiv:2510.27353v2 Announce Type: replace Abstract: Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based genetic algorithms produced heuristics offering new insights into the online bin packing problem under uniform and Weibull distributions. In this work, we reassess this claim through a detailed analysis of the heuristics produced by LLMs, examining both their behavior and interpretability. Despite being human-readable, these heuristics remain largely opaque even to domain experts. Building on this analysis, we propose a new class of algorithms tailored to these specific bin packing instances. The derived algorithms are significantly simpler, more efficient, more interpretable, and more generalizable, suggesting that the considered instances are themselves relatively simple. We then discuss the limitations of the claim regarding LLMs' contribution to this problem, which appears to rest on the mistaken assumption that the instances had previously been studied. Our findings instead emphasize the need for rigorous validation and contextualization when assessing the scientific value of LLM-generated outputs.

08.
medRxiv (Medicine) 2026-06-16

High-Risk Anti-Seizure Medication Use in Childbearing-Age People with Epilepsy in a Taenia solium Endemic Region

Background: People of childbearing potential with epilepsy in regions endemic for Taenia solium, where neurocysticercosis (NCC) is highly prevalent, represent a vulnerable population due to the elevated burden of epilepsy and resource limitations. Clinical practice in these settings remains poorly characterized. This study characterized anti-seizure medication (ASM) prescribing patterns by medication risk profiles among people of childbearing potential with epilepsy in Northern Peru, a region highly endemic for T. solium. Methods: Participants were drawn from a prospective, population-based epilepsy cohort in Tumbes, Peru (2006 to 2020). The analytic population included females with epilepsy aged 15 to 49 years. The primary outcome was pregnancy-associated ASM risk of congenital malformations and adverse neurodevelopmental outcomes. ASMs were classified as ''Established Low Risk'' (lamotrigine, levetiracetam), ''Possible Risk/Inadequate Data'' (carbamazepine, phenobarbital, phenytoin), and ''Established High Risk'' (valproic acid). Prescription patterns were examined in relation to demographic and clinical characteristics. Results: Among 1,975 individuals with epilepsy, 685 were people of childbearing potential. Approximately 34.9% met criteria for probable or definite NCC. Most ASM prescriptions were in the ''Possible Risk/Inadequate Data'' category (87.0%), and 12.8% received ''Established High Risk'' medications. In multivariable analysis, high-risk prescribing was associated with prior ASM use and polytherapy. Discussion: People of childbearing potential with epilepsy were predominantly treated with carbamazepine, phenytoin, phenobarbital, and valproate, reflecting local ASM availability. Despite evidence supporting lamotrigine and levetiracetam in pregnancy, prescribing patterns reflect local formulary constraints. These findings highlight a gap between guideline recommendations and real-world prescribing in resource-limited settings, underscoring the need for context-specific treatment strategies.

09.
arXiv (CS.CV) 2026-06-24

Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods

WordArt (artistic text) features highly customized fonts, textures, and layouts, making WordArt-oriented scene TExt Recognition (WATER) substantially more challenging than general Scene Text Recognition (STR). Existing STR datasets and methods, typically built around regular scene text and fixed-template inputs, struggle to scale to WATER. Thus, we aim to advance this task from both data and model perspectives. On the data side, we construct a 2M synthetic dataset, WATER-S, with the scale improved by hundreds of times compared to existing artistic text data. WATER-S consists of two complementary subsets. One rendered by an upgraded rendering pipeline (SynthWordArt), which provides highly accurate and controllable synthetic WordArt data. The other is generated by combining Qwen3-VL for prompt mining and Z-Image for image synthesis, which improves the coverage of realistic and diverse data. On the model side, we propose WATERec. It adopts an visual encoder supporting arbitrary-shaped inputs and an autoregressive decoder to model complex layouts, structurally breaking the bottleneck of fixed-template STR on WordArt. Experiments show that this architecture outperforms prior STR methods, achieving state-of-the-art performance on irregular texts such as WordArt. Together with WATER-R, carefully reorganized from existing real STR data, our strong baseline with the new synthetic data and model design reaches 90.40% accuracy on WordArt-Bench, surpassing both general-purpose and OCR-specialized vision-language models by a large margin. Code and data are available at https://github.com/YesianRohn/WATER.

10.
arXiv (CS.CL) 2026-06-15

Small LLMs: Pruning vs. Training from Scratch

Pruning promises a shortcut to strong small language models. In this work, we examine this promise by pruning Llama-3.1-8B at pruning ratios of 0.5–0.8 with six methods spanning depth, width, and sparse granularities, under two controlled token-matched settings. (1) With the same training token budget, pruned initialization consistently outperforms random initialization. This shows that the parent model provides a strong starting point, although the advantage narrows as the training token budget grows and as the pruning ratio rises, nearly vanishing at the highest pruning ratio we study. (2) When training from scratch is instead given the full token budget consumed by the whole pipeline, pruning at finer granularities still retains an advantage, while coarser structured pruning can be matched or surpassed. This suggests that the parent model transfers knowledge that additional training tokens alone cannot fully recover, but only at fine granularity. Taken together, our results yield a clear recommendation: with a large pretrained model in hand and a limited training token budget, pruning is better than training from scratch; when the training budget is not limited, training from scratch can be competitive for coarser pruning, so a large pretrained parent is not always necessary.

11.
arXiv (CS.CL) 2026-06-19

CogniFold: Always-On Proactive Memory via Cognitive Folding

Existing agent memory remains predominantly reactive and retrieval-based, lacking the capacity to autonomously organize experience into persistent cognitive structure. Toward genuinely autonomous agents, we introduce CogniFold, a brain-inspired "always-on" agent memory designed for the next generation of proactive assistants. CogniFold continuously folds fragmented event streams into self-emerging cognitive structures, bootstrapping progressively higher-level cognition from incoming events and accumulated knowledge. We ground this by extending Complementary Learning Systems (CLS) theory from two layers (hippocampus, neocortex) to three, adding a prefrontal intent layer. Emulating the prefrontal cortex as the locus of intentional control and decision-making, CogniFold achieves this through graph-topology self-organization: cognitive structures proactively assemble under the stream, merge when semantically similar, decay when stale, relink through associative recall, and surface intents when concept-cluster density crosses a threshold. We evaluate structural formation using CogEval-Bench, demonstrating that CogniFold uniquely produces memory structures that match cognitive expectations and concept emergence. Furthermore, across eight downstream benchmarks – two probing long-term conversational memory (LoCoMo, LongMemEval) and six spanning other cognitive domains – we validate that CogniFold simultaneously performs robustly on conventional memory tasks. Our code is available at https://github.com/OpenNorve/CogniFold.

12.
Nature (Science) 2026-06-10

Efficient and accurate neural-field reconstruction using resistive memory

作者:

Applications such as medical imaging, augmented and virtual reality, and embodied artificial intelligence (AI) depend on the ability to reconstruct complex signals from sparse observations. These applications are characterized by incomplete measurements and limited computational resources. Traditional approaches to digital hardware face the following challenges: explicit signal representations require heavy sampling and storage, data movement across the von Neumann bottleneck dominates energy and latency, and CMOS (complementary metal–oxide–semiconductor)-based circuits offer limited parallel efficiency. Here we present a software–hardware co-optimization framework for sparse-input signal reconstruction. At the software level, we use neural fields1 to implicitly represent signals using neural networks, which are further compressed by low-rank decomposition and structured pruning. At the hardware level, we design a resistive-memory-based computing-in-memory platform, featuring a Gaussian encoder and a multi-layer perceptron processing engine. The Gaussian encoder leverages the intrinsic stochasticity of resistive memory for efficient encoding, whereas the processing engine enables precise weight mapping through a hardware-aware quantization circuit. On a 40-nm 256 Kb resistive-memory macro, the system delivers 23.5×, 21.0× and 32.3× gains in projected energy efficiency, together with 10.8×, 38.8× and 6.2× gains in projected parallelism, for three-dimensional computed tomography sparse reconstruction, novel view synthesis and dynamic-scene novel view synthesis, without compromising on reconstruction quality. This work advances AI-driven signal reconstruction technology and paves the way for future efficient and robust medical AI and three-dimensional vision applications. A co-optimized AI hardware–software system using resistive-memory computing improves energy efficiency and parallelism for sparse signal reconstruction in imaging and three-dimensional vision applications.

13.
arXiv (CS.AI) 2026-06-16

Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning

arXiv:2605.06734v2 Announce Type: replace-cross Abstract: Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-qubit architectures that are difficult to scale on noisy intermediate-scale quantum (NISQ) devices and expensive to simulate classically. We propose gated QKAN-FWP, a fast-weight framework that integrates FWP with Quantum-inspired Kolmogorov-Arnold Network (QKAN) using single-qubit data re-uploading circuits as learnable nonlinear activation, known as DatA Re-Uploading ActivatioN (DARUAN). We further introduce a scalar-gated fast-weight update rule that stabilizes parameter evolution, supported by a theoretical analysis of its adaptive memory kernel, geometric boundedness, and parallelizable gradient paths. We evaluate the framework across time-series benchmarks, MiniGrid reinforcement learning, and highlight real-world solar cycle forecasting as our main practical result. In the long-horizon setting with 528-month input window and 132-month forecast horizon, our 12.5k-parameter model achieves lower scaled Mean Square Error (MSE), peak amplitude error, and peak timing error than a suite of classical recurrent baselines with up to 13x more parameters, including Long Short-Term Memory (LSTM) networks (25.9k-89.1k parameters), WaveNet-LSTM (167k), Vanilla recurrent neural network (11.5k), and a Modified Echo State Network (132k). To validate NISQ compatibility, we further deploy the trained fast programmer on IonQ and IBM Quantum processors, recovering forecasting accuracy within 0.1% relative MSE of the noiseless simulator at 1024 shots. These results position gated QKAN-FWP as a scalable, parameter-efficient, and NISQ-compatible approach to quantum-inspired sequence modeling.

14.
arXiv (CS.LG) 2026-06-11

Apertus LLM Family Expansion via Distillation and Quantization

arXiv:2605.29128v2 Announce Type: replace Abstract: The wide adoption of LLMs has led to their use in great variety of applications and scenarios, such as chatbot assistants and data annotation, creating the need for the models to satisfy certain budget and hardware constraints. This has led to the trend of LLMs being released in batches consisting of similar models of various sizes for the family of models to adhere to as wide of a range of constraints as possible. In this paper, we validate distillation and quantization as a cost-effective way to expand model families to new sizes and hardware formats. Based on the open-recipe Apertus 8B LLM, we produce Apertus-v1.1 - a distilled family of models with up to 4B parameters trained on 1.7T permissive license tokens. We demonstrate cost-efficiency and strong accuracy performance of our approach for covering large ranges of hardware and systems requirements.

15.
arXiv (CS.LG) 2026-06-15

Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion

arXiv:2606.14492v1 Announce Type: new Abstract: We present a recipe-controlled decoder audit (RCDA) for structural transductive knowledge-graph completion (KGC). The audit asks a simple reporting question: before attributing gains to an encoder or training recipe, what changes when the decoder is swapped under the same recipe? Using ComplEx and DistMult as the primary controlled pair, with targeted RotatE/TransE spot-checks, we evaluate seven benchmarks. On five standard KGs, ComplEx-vs-DistMult differences are modest but consistent under our recipe (+0.005 to +0.012 MRR), whereas CompGCN-style encoder effects vary more by dataset. On small KGs, decoder effects become the main diagnostic: Kinship shows a stable ComplEx advantage of +0.143 MRR (6 seeds), while UMLS favours ComplEx by +0.022 MRR in a clean 6-seed server rerun but reverses in an earlier provenance variant. We therefore treat small-KG decoder choice as recipe- and provenance-sensitive rather than as a fixed dataset winner. We further show that decoder choice interacts with encoder depth on WN18RR, and that under our recipe L=0 ComplEx on YAGO3-10 reaches 0.6971 +/- 0.0048 MRR at d=128. The result is a compact audit protocol: report matched decoder rows, log small-KG provenance, and sweep decoder x depth before making encoder-level claims.

16.
arXiv (math.PR) 2026-06-25

Information from coincidences

arXiv:2606.25042v1 Announce Type: cross Abstract: We prove a single algebraic mixed coincidence identity that unifies a broad swath of information-theoretic variational results. For any family of priors $\{\pi_i\}$ and real exponents $\{ \alpha_i \}$, the log of the mixed count $E_{x\sim\nu}\!\left[\prod_{i=1}^W \pi_i^{\alpha_i}(x)\right]$ is simultaneously a Boltzmann coincidence weight, an exponential-family normalizer, a maximum-entropy value, and a KL-barycenter optimum. The identity yields a unified derivation of classical cornerstones of information theory: concentration of empirical distributions (Sanov-type decompositions and Gibbs conditioning), hypothesis-testing error exponents (Chernoff information and its multi-way analogue), change-of-measure inequalities (Donsker-Varadhan and PAC-Bayes), and laws governing rare-pattern coincidences (Erdos-Renyi run-length, iterative guesswork, rate-distortion, and birthday thresholds). Each is recovered as a specialization of the same algebraic equality. It strictly generalizes the classical Renyi entropy and divergence variational formulas (one and two priors respectively) to a $W$-prior simplex, and holds for unnormalized and continuum-indexed priors. Among its consequences are an exact multi-prior PAC-Bayes penalty that subtracts an explicit "coincidence bonus" from the usual single-prior posterior penalty, and the asymptotic MAP error exponent for $W$-ary hypothesis testing as an edge-restricted simplex optimum. We demonstrate the calculus at scale on two large alphabets encoding richly modeled sequential languages: on language-model next-token predictives where we recover contrastive decoding, and on human genomic regulatory sequence where it separates correlated from diverse prior families along a sliding-window trace.

17.
arXiv (CS.CL) 2026-06-24

Aspect-Based Sentiment Evolution and its Correlation with Review Rounds in Multi-Round Peer Reviews: A Deep Learning Approach

Mining sentiment information from the textual content of peer review comments offers valuable insights into the scientific evaluation process. However, previous studies are often constrained by coarse-grained analysis and the lack of differentiation across review rounds. Notably, the dynamic shifts in reviewers' focus and sentiment tendencies throughout multiple review stages remain underexplored. To address this gap, the present study investigates the distribution and evolution of aspect-level sentiments and examines their correlation with the number of review rounds. We begin by segmenting the multi-round review comments of 11,063 accepted papers from Nature Communications and identifying fine-grained review aspect clusters. A manually annotated corpus of approximately 5,000 review sentences is then constructed. Using this dataset, we train a series of deep learning-based aspect sentiment classification models. Among them, the LCF-BERT-CDM model achieves the best performance, with a Macro-F1 score of 82.65%. Subsequent statistical analysis reveals a consistent trend: as the number of review rounds increases, the proportion of positive sentiments rises, while negative sentiments decline. Correlation analysis further indicates that aspect sentiment scores are negatively associated with the total number of review rounds. Key aspects exhibiting stronger correlations include "experiments", "research significance" and "result analysis".

18.
arXiv (CS.LG) 2026-06-25

Lifelong In-Context Learning with Transformers Requires Parametric Forms of Attention

arXiv:2606.25342v1 Announce Type: new Abstract: Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that extending in-context learning to lifelong settings is a practical solution for continual learning in AI agents. In particular, we argue that parametric forms of attention are needed to understand a lifetime of context with transformers on a fixed hardware budget. These attention mechanisms learn the relationship between keys and their associated values at test-time with parametric regression. Our generalization of parametric approaches (linear attention, state-space models, fast weight programmers, and test-time training layers) contrasts with nonparametric counterparts like softmax attention. They replace the ever-growing key-value cache with an online-trainable neural network, maintaining a constant memory footprint. We highlight how parametric attention currently fall short of lifelong learning due to limited memory capacity or costly online updates. To address these issues, we pose a set of open questions with novel insights to guide the field toward long-horizon agents.

19.
arXiv (CS.CV) 2026-06-17

A Quantitative Analysis of Multimodal Biomarkers in Alzheimer's Disease

Despite increasing adoption of multimodal approaches in Alzheimer's Disease (AD) research – aimed at integrating molecular, structural, clinical, and genetic biomarkers to enhance disease characterization – the relationships among these modalities remain poorly understood. A systematic analysis of their dynamic interaction is essential for improving disease modeling, identifying redundant assessments, and reducing patient burden and acquisition costs. In this paper, we present a quantitative analysis of multimodal AD biomarkers by integrating tau-PET, structural MRI, cognitive scores (MMSE and CDR), and APOE4 data from 789 subjects drawn from the ADNI dataset. In our analyses, we (A) quantify cross-modal mutual information and explained variance to assess redundancy and predictive dependencies; (B) examine associations between tau topologies and structural atrophy across brain regions to select informative ROIs; (C) perform a statistical decomposition of the tau-cognition association into atrophy-related and atrophy-independent components; (D) and identify a dominant neurodegenerative trajectory that aligns with cognitive decline. This study provides a systematic characterization of cross-modal relationships, improving the interpretability and selection of biomarkers in AD. Code is publicly available at: https://github.com/antonioscardace/Multimodal-AD.

20.
arXiv (quant-ph) 2026-06-11

Clifford disentanglers for entanglement reduction in molecular electronic structure simulations

arXiv:2606.12056v1 Announce Type: new Abstract: Entanglement is a key bottleneck limiting the efficiency of tensor-network and quantum simulations of molecular electronic structures. Here, we systematically assess and extend Clifford disentanglers as a structure-preserving approach to entanglement reduction: they can modify the entanglement structure of qubit wavefunctions while retaining the Pauli-string form of qubit Hamiltonians. To enable a practical search over Clifford transformations, we classify Clifford operators by their action on the Schmidt spectrum across a bipartition, reducing the two- and four-qubit search spaces to 20 and 91392 representatives, respectively. Embedded in an iterative Clifford-augmented matrix product state framework, these transformations reduce the energy errors at fixed bond dimension for the molecular test cases studied and mitigate the dependence on orbital orderings and fermion-to-qubit mappings. We further show that Clifford disentanglers can also benefit quantum simulations such as the shallow-circuit variational quantum eigensolver calculations. Together, these results establish Clifford disentanglers as a useful structure-preserving entanglement-engineering tool for tensor-network and quantum simulations of molecular electronic structure, while also clarifying their correlation dependence and motivating future developments.

21.
arXiv (CS.CL) 2026-06-18

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticality property. Motivated by it, we propose STARE (Surprisal-guided Token-level Advantage Reweighting for policy Entropy stability), which identifies entropy-critical token subsets via batch-internal surprisal quantiles, selectively reweights their effective advantages, and incorporates a target-entropy closed-loop gate for stable entropy regulation. Across model scales from 1.5B to 32B and three task families (Short CoT, Long CoT, and Multi-Turn Tool Use), STARE sustains stable RL training over thousands of steps while maintaining policy entropy within the target band. On AIME24 and AIME25, STARE outperforms DAPO and other competitive baselines by 4%-8% in average accuracy, with reflection tokens and response length growing in tandem, indicating sustained exploration-exploitation balance that further unlocks RL training potential.Code is available at https://github.com/hp-luo/STARE.

22.
arXiv (CS.LG) 2026-06-16

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

arXiv:2507.20424v3 Announce Type: replace Abstract: We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show that DPPF outperforms other communication-efficient approaches and achieves better generalization performance than local gradient methods and synchronous gradient averaging, while maintaining communication efficiency. In addition, our loss landscape visualizations confirm the ability of DPPF to locate flatter minima. On the theoretical side, we show that DPPF guides workers to span flat valleys, with the final valley width governed by the interplay between push and pull strengths, and that its pull-push dynamics is self-stabilizing. We further provide generalization guarantees linked to the valley width and prove convergence in the non-convex setting.

23.
arXiv (CS.CL) 2026-06-16

P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs

As Large Language Models (LLMs) become embedded in everyday communication, capturing regional linguistic variation is essential for reliable and equitable language use. In Portuguese, European (pt-PT) and Brazilian (pt-BR) varieties remain unevenly represented, with pt-BR dominating in data quantity, while LLM preference for Portuguese variants remains underexplored. To address this gap, we introduce P3B3, an expert-curated language variety agnostic benchmark of conversational prompts, along with an evaluation framework for measuring variety bias and controllability. Experiments on several models show that most LLMs exhibit a strong bias toward pt-BR, with variation in controllability across models. These results highlight the need for more balanced multilingual representation across language varieties.

24.
arXiv (CS.CV) 2026-06-18

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising; the phase degrades significantly (dropping by $\approx 18\%$ from step 2 to step 50), whereas the magnitude remains relatively stable. Building on this insight, we propose PhaseLock, a training-free framework that preserves the valid motion priors from few-step inference throughout the denoising trajectory. Rather than relying on full-step inference for physical consistency, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Our approach effectively mitigates phase degradation, improving physical consistency by an average of 6.2 points across diverse models while largely maintaining visual fidelity, with negligible overhead ($1.06\times$ time, $1.02\times$ memory) and reduced reliance on expensive external guidance methods ($\sim5\times$ time). Project Page: https://dnwjddl.github.io/phaselock

25.
arXiv (CS.AI) 2026-06-24

Legal Reasoning Is Not Lawyering: Rethinking Legal Benchmarks for Pro Se Access to Justice

arXiv:2606.23716v1 Announce Type: cross Abstract: Legal AI benchmark research frequently invokes the assumption that large language models can improve access to justice, including for people who cannot access lawyers in order to understand and exercise their legal rights. We argue that current benchmarks are not equipped to support this assumption because they evaluate legal reasoning over inputs that have already been preprocessed by legal experts, which measures the upper bound of model performance. Access to justice depends on a lower bound: how models perform when inputs come from pro se litigants, whose prompts may contain noisy narratives, buried facts, omissions, folk-legal assumptions, and surface-level errors. These degradations are comparable to conditions under which LLMs are known to degrade in the general machine learning literature, including long-context sensitivity, underspecification, hallucination, and typographical perturbations. We connect evidence from pro se literature with this body of machine learning research and present a small perturbation experiment on LEXam, a legal benchmark, to illustrate the gap between these two bounds. If model development continues to focus on benchmarks that measure only the upper bound, this gap may remain hidden or even widen. We conclude by calling for legal benchmarks that directly measure robustness under pro se-like inputs so that access-to-justice claims about legal AI can become empirically testable.