Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-19

Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

Test-time reasoning is increasingly used as a serving-time control knob, but extra reasoning is not uniformly valuable: it can repair failed attempts, waste compute on already-correct answers, or introduce harmful answer changes. We study this as a deployment allocation problem rather than a new-verifier problem. We introduce \sevra, Selective Verification for Reasoning Allocation, a serving-layer controller that decides whether to preserve a frozen solver's initial answer or invoke active verification. Using a frozen Qwen3-4B solver, we log intervention outcomes and train recoverability-aware gates from serving-visible attempt state. On \mathfive, selective verification reaches 76.3\% accuracy, compared with 75.5\% for always verifying, while reducing post-generation tokens by 26.8\% and harmful flips from 2.2\% to 1.0\%. However, an 8,192-token initial solve reaches 76.0\% accuracy with 28\% fewer total model tokens, showing that selective recovery is useful but not the best tested cost frontier. In frozen transfer to \gsm, the selective policy verifies only 3.0\% of examples, improves accuracy from 93.4\% to 94.5\%, and reduces verification tokens by 91.2\% relative to always verifying; again, a longer initial solve matches its accuracy with fewer realized tokens. On CommonsenseQA, always-on verification hurts, while Self-Consistency@5 improves accuracy at about five times the realized token cost. The resulting deployment rule is: tune the initial budget first, then use selective recovery when explicit checks, bounded retries, auditability, or regression-risk control matter.

02.
arXiv (CS.CV) 2026-06-16

Akasha 2: Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architectur

Authors:

We present Akasha 2, a state-of-the-art multimodal architecture that integrates Hamiltonian State Space Duality (H-SSD) with Visual-Language Joint Embedding Predictive Architecture (VL-JEPA). The system leverages the Mamba-3 Selective State Space Model (SSM) augmented by a Sparse Mixture of Hamiltonian Experts (SMoE-HE) that enforces latent physical conservation laws through symplectic integration. For visual synthesis, we introduce Hamiltonian Flow Matching (HFM) and persistent 3D Gaussian Splatting (3DGS), enabling ultra-low latency (

03.
medRxiv (Medicine) 2026-06-24

Beyond Nodal Status: Interactions Between Molecular Subtype, Tumor Burden, and Survival in 12,225 Patients with Breast Cancer

Background Lymph node status and molecular subtype are among the most established prognostic factors in breast cancer. However, the extent to which their prognostic effects vary across different tumor size categories and clinical subgroups remains incompletely understood. We investigated the interplay between nodal status, molecular subtype, and tumor size in a large real world breast cancer cohort and developed a prognostic nomogram for individualized survival prediction. Methods A total of 12,225 women with invasive breast cancer from the Shiraz Breast Cancer Registry were analyzed. Patients were stratified according to tumor size, lymph node status, and molecular subtype. Overall survival (OS) and disease free survival (DFS) were evaluated using Kaplan Meier analyses and subgroup comparisons. Logistic regression was performed to identify predictors of lymph node involvement, while Cox regression was used to determine independent prognostic factors. A nomogram was subsequently developed and internally validated for prediction of 3-year and 5-year OS. Results Of 12,225 patients, 41.7% had lymph node positive disease. Across nearly all tumor size categories and molecular subtypes, nodal involvement was associated with significantly worse OS and DFS. Notably, the survival disadvantage associated with nodal positivity was more pronounced among patients with larger tumors and among those with HER2 positive and triple negative breast cancer (TNBC). Although TNBC demonstrated the lowest rate of lymph node involvement among molecular subtypes (adjusted OR 0.54, 95% CI 0.46-0.63), it appeared to show one of the largest survival gaps between node positive and node negative disease. In the overall cohort, survival outcomes generally ranked from best to worst as Luminal A, Luminal B, HER2 positive, and TNBC. However, survival differences among molecular subtypes were not consistently observed across all tumor size and nodal status subgroups. When significant differences were present, Luminal A and Luminal B tumors consistently showed superior outcomes compared with HER2 positive and TNBC tumors. Multivariable analysis identified lymph node status, tumor size, molecular subtype, lymphovascular invasion, tumor necrosis, type of surgery, radiotherapy, hormone therapy, and adjuvant chemotherapy as independent prognostic factors. A nomogram integrating clinicopathological and treatment variables demonstrated good predictive performance, with time dependent AUCs of 0.749 and 0.751 for 3 year and 5 year OS, respectively, and showed good calibration. Conclusions The prognostic impact of lymph node status is not uniform across breast cancer subgroups and appears particularly pronounced in larger tumors and biologically aggressive subtypes. Despite a lower likelihood of nodal involvement, TNBC showed substantial outcome deterioration when nodal metastasis was present. These findings highlight the importance of jointly considering nodal status, molecular subtype, and tumor burden in prognostic assessment.

04.
arXiv (CS.CV) 2026-06-25

Enhancing Brain MRI Anomaly Detection and Reasoning with ROI Rethink and Synthetic Data

Medical vision-language models typically generate diagnoses through single-pass inference without indicating which image regions support their conclusions. This lack of spatial grounding limits clinical utility: outputs cannot be audited, and models may hallucinate findings on normal scans. We present BrReMark (Brain Rethink via ROI Marking), a framework that introduces explicit region marking into brain MRI diagnosis. The model first generates hypotheses about potential abnormalities and grounds them through explicit bounding box marking, then verifies conclusions by re-examining the marked evidence. Training combines supervised fine-tuning on structured reasoning trajectories with reinforcement learning using a composite reward over localization accuracy and diagnostic reasoning. Furthermore, we integrate a domain randomization-based pathology synthesis augmentation strategy to improve the model's generalizability to out-of-distribution (OOD) data. On internal benchmark, BrReMark improves mAP50 from 0.74% to 37.54% compared to the base model, while achieving 21.57% Clinical F1 and 45.26% diagnostic accuracy. On NOVA OOD benchmark, it also achieves competitive overall performance with a 45.7% reduction in false positives compared to the state-of-the-art, indicating reduced hallucination on rare pathologies. These findings suggest that explicit hypothesis-verification grounding is a practical path toward trustworthy open-ended brain MRI diagnosis across both in-distribution and OOD settings.

05.
medRxiv (Medicine) 2026-06-18

Intra-arterial recombinant human TNK tissue-type plasminogen activator (rhTNK-tPA) thrombolysis for acute medium vessel occlusion (MeVO-TNK): Study rationale and design

Background The optimal management of acute ischemic stroke caused by medium vessel occlusion (MeVO) remains uncertain. Recent randomized trials have failed to demonstrate a clear benefit of endovascular therapy in this population, whereas intra-arterial thrombolysis (IAT) has emerged as a biologically plausible alternative. However, prospective evidence supporting IAT in MeVO is lacking, and the optimal dosing strategy for stand-alone IAT remains undefined. Aim To preliminarily evaluate the efficacy and safety of intra-arterial tenecteplase (IA-TNK) plus standard medical therapy (SMT) compared with SMT alone in patients with acute MeVO stroke, and to explore a stepwise IA-TNK dosing strategy. Design The MeVO-TNK trial is a multicenter, prospective, randomized, open-label, blinded-endpoint (PROBE), exploratory phase II study. A total of 60 participants with imaging-confirmed MeVO will be randomized 1:1 to receive either IA-TNK plus SMT or SMT alone. Participants presenting beyond 6 hours from symptom onset must demonstrate salvageable penumbral tissue on advanced imaging. Those assigned to the intervention group will receive up to two intra-arterial boluses of tenecteplase (0.0625 mg/kg per bolus), with the second bolus administered based on angiographic assessment of reperfusion and safety. Outcomes The primary efficacy outcome is final infarct volume measured at 72{+/-}24 hours after randomization. Secondary efficacy outcomes include the proportions of patients achieving modified Rankin Scale (mRS) scores of 0-1, 0-2 and 0-3 at 90 days, a shift analysis of the mRS distribution at 90 days, early neurological deterioration, and National Institutes of Health Stroke Scale score at 7 days or discharge. The primary safety outcome is symptomatic intracranial hemorrhage within 24 hours. Conclusions This trial will provide preliminary evidence on the biological efficacy, reperfusion potential and safety of stand-alone IA-TNK for acute MeVO stroke, helping to address an important evidence gap and inform the design of future confirmatory studies.

06.
arXiv (CS.CL) 2026-06-16

Neuron Level Analysis of Large Language Model in Legal Domain Reasoning

We presented a neuron-level analysis of legal-domain reasoning in LLMs, comparing it with other applied domain tasks across seven open-weight models. Using neuron attribution scores to rank and suppress influential neurons, we confirmed that suppressing the identified neurons collapses accuracy on the target task, whereas suppressing the same number of random neurons does not. We further found a small subset of neurons influential across all seven tasks; once these are removed, suppressing the remaining neurons degrades only the task they were identified from, revealing genuinely task-specific neurons in every model studied. Within the legal domain, the three benchmarks exhibit relatively high neuron overlap and tend to be affected jointly, suggesting of legal components neurons that span jurisdictions. The distribution of identified neurons in our experiments suggests that the hypothesis that influential neurons are concentrated in middle MLP layers may depend on the input format and content, rather than being a universal phenomenon.

07.
arXiv (CS.AI) 2026-06-24

Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings

arXiv:2505.13087v2 Announce Type: replace-cross Abstract: We propose a novel benchmarking methodology for graph neural networks (GNNs) based on the graph alignment problem, a combinatorial optimization task that generalizes graph isomorphism by aligning two unlabeled graphs to maximize overlapping edges. We frame this problem as a self-supervised learning task and present several methods to generate graph alignment datasets using synthetic random graphs and real-world graph datasets from multiple domains. For a given graph dataset, we generate a family of graph alignment datasets with increasing difficulty, allowing us to rank the performance of various architectures. Our experiments prove that there is an optimal task difficulty for having a statistically relevant ranking of different models and that, even on a structure-only task, anisotropic models perform better compared to isotropic ones. To further prove that our synthetic task capture meaningful information, we show its effectiveness for self-supervised GNN pre-training: the learned node embeddings can be leveraged as positional encodings by transformers for graph regression or can be used to reconstruct the full structure of the graph with $98\%$ accuracy. To support reproducibility and further research, we provide an open-source Python package to generate graph alignment datasets and benchmark new GNN architectures. The source code is available at https://github.com/adrien-lagesse/graph-alignment-benchmark.

08.
arXiv (CS.CV) 2026-06-11

A2SG:Adaptive and Asymmetric Surrogate Gradients for Training Deep Spiking Neural Networks

Training deep spiking neural networks (SNNs) remains challenging due to sharp loss landscapes and temporal inconsistency caused by surrogate gradients. To address these challenges, we propose a unified framework: adaptive and asymmetric surrogate gradients A2SG. The adaptive gradients adjust an effective window for spatio-temporal adaptation, reducing spatial gradient variation and maintaining directional consistency of gradients over time. The asymmetric gradients reflect neuronal dynamics by assigning larger gradients to neurons with higher membrane potentials, and we prove that they yield lower variation than symmetric surrogates. Our analysis further establishes a direct connection between local gradient variation and the curvature of the loss landscape, providing a principled explanation for how A2SG promotes convergence to flatter minima and improves generalization. We conduct extensive experiments on diverse models, including CNN-based and Transformer-based SNNs, across various tasks such as image classification using both static and neuromorphic datasets, as well as segmentation. The results demonstrate that A2SG consistently improves accuracy and energy efficiency, establishing it as a general and reliable solution for training deep SNNs. Our code is available at https://github.com/KIST-NCL/A2SG.git.

09.
arXiv (CS.CV) 2026-06-11

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, an MLLM forms a spatial hypothesis from the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesized novel-view video. To enable effective cross-view revisiting, we design a Geometry-to-Video pipeline that renders strategically complementary novel views from predicted 3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving the MLLM's native video interface without architectural modifications. Extensive evaluations on VSI-Bench and STI-Bench demonstrate that ReRe substantially boosts open-source MLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/

10.
arXiv (CS.CV) 2026-06-18

Reasoning as Intersection: Consensus-Frame Alignment for Visual Focus in Video-MLLMs

Reinforcement learning has improved the reasoning ability of large language models, but applying outcome-only rewards to video multimodal large language models (Video-MLLMs) provides limited guidance on which visual evidence should support the answer. Inspired by multisensory integration, where consistent cues can enhance the salience and reliability of perceptual estimates, we introduce Consensus Frame GRPO (CF-GRPO), a temporal-annotation-free process-level reward framework for evidence-aware video reasoning. CF-GRPO constructs a consensus frame prior from intrinsic video cues, including temporal coverage, scene-transition cues, and query-conditioned visual relevance. It then computes a model-side frame-use score from visual and response representations and optimizes their agreement through the Consensus Frame Reward (CFR). With salience-aware sparse aggregation and distribution sharpening, CFR provides a high-contrast reward signal without requiring human temporal annotations. Experiments show that VideoCFR achieves competitive performance across complex video reasoning benchmarks and improves several metrics over representative Video-MLLM and RL baselines, while the consensus prior provides an interpretable view of the evidence frames emphasized during training. The implementation is available at https://github.com/1Pansy/VideoCFR.

11.
arXiv (quant-ph) 2026-06-24

Ultra-Low-Rate Information Reconciliation: Repetition Coding or Dedicated Codes?

arXiv:2606.23726v1 Announce Type: new Abstract: We compare repetition-based ultra-low-rate information reconciliation with dedicated ultra-low-rate codes for CV-QKD. Repetition coding offers a favorable performance-complexity trade-off, incurring only a moderate error-rate penalty while reducing decoding complexity by $2\times$, making it attractive for implementation-constrained systems.

12.
arXiv (quant-ph) 2026-06-24

Certifying Quantum Optimization and Circuit Cutting by Using Quantum-Classical Moment Duality

Authors:

arXiv:2606.23727v1 Announce Type: new Abstract: We establish a direct quantum-classical duality based on the degree-$2$ Sum-of-Squares (SoS) semidefinite programming cone: the matrix of two-qubit Pauli-$Z$ correlation functions obtained from any quantum state $\rho$ is automatically a feasible point of the classical Goemans-Williamson (GW) relaxation. This observation provides a universal ``safety net'' for quantum optimization algorithms: applying GW random hyperplane rounding to the quantum-driven moment matrix yields a certified expected cut value $\mathbb{E}[\mathrm{Cut}] \ge \alpha_{\mathrm{GW}}\langle\mathcal{H}\rangle_\rho$, valid for every state produced by variational algorithms such as QAOA or the Variational Quantum Power Method (VQPM), regardless of convergence quality. We further show that the same moment matrix reveals the tensor-product structure of the underlying unitary circuit, enabling a polynomial-time, correlation-based circuit cutting procedure with rigorous error bounds. The framework is validated numerically on Max-Cut instances for variational quantum algorithms and on random states for circuit cutting, demonstrating that the cheap two-point correlation data are sufficient to locate near-optimal bipartitions and that the theoretical error bounds hold in practice.

13.
arXiv (CS.AI) 2026-06-12

Multi-Modal Agents for Power Distribution Defect Detection: An Evaluation of Foundation Models

Authors:

arXiv:2606.12969v1 Announce Type: new Abstract: The power distribution network is critical to reliable electricity delivery, yet traditional inspection methods face limitations in semantic understanding, generalization, and closed-loop automation. To address these challenges, this paper proposes a Multi-Modal Agent framework specifically for power distribution defect detection. Central to this study is the systematic evaluation of multimodal foundation models as unified cognitive engines. We rigorously assess their integrated performance across three critical capabilities: (1) Perception, where the model must accurately identify equipment and generate expert-level descriptions of defects; (2) Reasoning, where the model interprets visual findings to diagnose causes, assess severity, and plan maintenance strategies based on domain knowledge; and (3) Tool Usage, where the model acts as an autonomous operator to execute actions – such as querying knowledge bases or generating work orders – to achieve closed-loop maintenance. To support this evaluation, a domain-specific evaluation dataset and a comprehensive benchmark are developed. Experimental results demonstrate the strengths and limitations of current foundation models in these three dimensions, providing empirical evidence for deploying autonomous agents in high-stakes industrial environments.

14.
arXiv (CS.LG) 2026-06-18

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

arXiv:2606.19292v1 Announce Type: new Abstract: Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, early prediction and prevention remain challenging. Environmental factors such as ambient sound and light may influence the onset of delirium, yet they are often overlooked in risk assessments. In this study, we examined whether light intensity and sound pressure levels can independently predict delirium across multiple prediction horizons. We evaluated four efficient sequential neural network models on data collected from 9 ICUs across 309 patients to predict delirium for 10 prediction-window sizes. We reported feature importance and direction of influence using Shapley Additive Explanations analysis. The convolutional model achieved the strongest discrimination, with AUC = 0.80 on sound data and on combined data. Sound features were the dominant predictors overall. Integrating sound with light improved short-term ($

15.
medRxiv (Medicine) 2026-06-23

Default Handling of the Non-Assessable Verbal Glasgow Coma Scale Misclassifies Illness Severity in Mechanically Ventilated Patients: A Retrospective Analysis

Background: The Glasgow Coma Scale (GCS) is a universal neurologic severity score in the intensive care unit and is incorporated into APACHE, SOFA, mortality prediction models, ICU benchmarking, and quality metrics. In mechanically ventilated patients, however, the verbal component cannot be assessed. Common conventions, including assigning a normal total GCS of 15 or excluding patients with missing verbal scores, may misclassify the sickest patients as neurologically normal or remove them from analysis. Objective: To quantify non-assessable verbal GCS examinations after acute brain injury and determine how different handling conventions affect severity scoring and mortality-model performance across two independent critical care databases. Materials and Methods: We conducted a retrospective cohort study of adults with acute brain injury during their first ICU stay in MIMIC-IV, with replication in eICU-CRD. A verbal examination was considered non-assessable when documented as No Response-ETT. We measured the burden and determinants of non-assessability, compared the MIMIC-IV derived GCS convention with a component-aware GCS, and evaluated mortality-model handling strategies. Results: Among 14,230 patients, 45.2% had a non-assessable verbal examination, and 47.5% of ventilated patients had no assessable verbal score in the first 24 hours. Non-assessability was strongly associated with mechanical ventilation and mortality. The MIMIC-IV derived GCS assigned a score of 15 to 42.9% of patients and placed 11.6% in the lowest severity category despite eye and motor findings consistent with GCS [≤]9. Complete-case handling excluded 28.5% of patients, who accounted for 50.2% of deaths. Similar distortions were observed in eICU-CRD/APACHE across 171 hospitals. Discussion: Default-to-normal scoring can make severely ill intubated patients appear neurologically normal, while complete-case analysis removes the highest-risk patients. Conclusion: Non-assessable verbal GCS in mechanically ventilated patients should be explicitly flagged and reported in ICU severity scores, risk-adjusted mortality models, and benchmarking systems.

16.
arXiv (math.PR) 2026-06-15

Longest weakly increasing subsequences of discrete random walks on the integers with heavy tailed distribution of increments

arXiv:2603.29047v2 Announce Type: replace-cross Abstract: We investigate the behavior of the length of the longest weakly increasing subsequences (weak LIS) of $n$-step random walks with nonzero integer increments $k = \pm 1, \pm 2, \dots$ given by a symmetric heavy tailed mass distribution proportional to $|k|^{-1-\alpha}$ for several values of the real parameter $\alpha > 0$ together with that of the simple random walk ($k=\pm 1$), to which the $n$-step heavy tailed walks reduce when $\alpha$ grows large enough that step jumps beyond $\pm 1$ become essentially absent on the scale of $n$. By means of exploratory fits, weighted nonlinear least squares, and nested-model comparisons, we found that the sample average length $\langle{L_{n}}\rangle$ scales like $\langle{L_{n}}\rangle \sim \sqrt{n}\log{n}$ when the distribution of increments has finite variance ($\alpha > 2$) and $\langle{L_{n}}\rangle \sim n^{\theta}$ with a varying exponent $\theta > 0.5$ when the variance is infinite ($\alpha \leq 2$). Distributional diagnostics indicate that the bulk of the $L_{n}$ distribution is very well-approximated by a lognormal model, though systematic deviations are observed in the tails. Our results corroborate and expand upon previous results for the LIS of other types of heavy-tailed random walks and raise a conjecture as to whether the distribution of $L_{n}$ is given, or can be effectively described, by a lognormal distribution.

17.
arXiv (CS.LG) 2026-06-11

Neural-Parameterized Cellular Automata for Wildfire Spread

arXiv:2606.11676v1 Announce Type: cross Abstract: Traditional wildfire models rely on rigid, low-dimensional parameters and static fuel maps, frequently underpredicting fire spread. To address this weakness, we introduce a hybrid deep-learning parameterized Probabilistic Cellular Automata (CA) framework implemented in JAX. Our approach employs a Multi-Scale Convolutional Neural Network to dynamically generate spatially varying parameters that govern fire-spread probability, wind alignment, and slope influence. This hybrid design captures complex, nonlinear environmental interactions while preserving the physical interpretability of the underlying three-state CA. The JAX implementation enables hardware acceleration and gradient-based parameter calibration. Evaluated on six large-scale wildfires in the western United States, the model maintains IoU > 0.6 over 72-hour forecast horizons after a 10-day data assimilation window during which the model is fitted incrementally to observed perimeters; the resulting forecast is a conditional projection of fire growth under the suppression regime already ncoded in those observations.

18.
arXiv (CS.CV) 2026-06-16

G2IA: Geometry-Guided Instance-Aware Retrieval and Refinement for Cross-Modal Place Recognition

Cross-modal place recognition (CMPR) enables camera-only robots to localize against pre-built LiDAR maps in autonomous navigation scenarios. This image-to-point-cloud setting is challenged by two coupled ambiguities: the modality gap between perspective RGB appearance and sparse metric geometry, and perceptual aliasing among urban places with similar roads, facades, intersections, and object arrangements. Instead of treating CMPR as a single global descriptor matching problem, we argue that reliable retrieval requires both geometry-aware representation alignment and fine-grained candidate verification. In this paper, we propose G2IA, a geometry-guided instance-aware framework for image-to-point-cloud place recognition. In the retrieval stage, visual geometry priors from VGGT and instance features are integrated to construct place descriptors that are more compatible with LiDAR-derived map representations. In the refinement stage, the retrieved candidates are re-ranked by explicitly verifying whether local instance shapes and their relative spatial layouts are consistent across modalities. Experiments on public benchmarks demonstrate that G2IA consistently improves image-to-point-cloud place recognition under different localization thresholds, and exhibits strong cross-dataset generalization.

19.
arXiv (CS.AI) 2026-06-11

Robust Privacy: Inference-Stage Privacy through Certified Robustness

arXiv:2601.17360v2 Announce Type: replace-cross Abstract: An adversary observing a model's released prediction can infer sensitive attributes of the queried input, or even reconstruct representatives of the model's training data. The inference interface thus acts as a side channel for privacy leakage. We introduce Robust Privacy (RP), an inference-stage privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-R neighborhood around an input x with confidence at least $1-\alpha$, then x enjoys $(R,\alpha)$-Robust Privacy, under which we prove that any adversary observing the released prediction has at most $\alpha/2$ advantage in distinguishing x from any input within distance R of x. Building on RP, we formalize Robust Attribute Privacy (RAP), an attribute-level privacy notion that characterizes the set of sensitive-attribute values that remain compatible with a released prediction. On a classification task, RP increases the median length of the RAP-compatible inference interval from 23.50 to 29.96, reducing attribute-inference precision. Model inversion attacks, often treated as a training-stage threat, in fact rely on fine-grained signals leaked through the inference interface; RP masks these signals at the inference stage, reducing attack success rate (ASR) from 73% to 4% on a black-box inversion attack. This direct targeting of the leakage channel enables RP to dominate DP-SGD and randomized response in the privacy-utility tradeoff space: RP retains 98.4% accuracy at 21% ASR, whereas DP-SGD must drop accuracy to 61.7% to reach a comparable ASR. Across both experiments, increasing the smoothing sample size N strengthens privacy and improves utility together. Finally, we examine model distillation as a scope boundary and show that RP mitigates attribute-level and instance-level inference-stage privacy leakage, but not function-level extraction through model distillation.

20.
arXiv (CS.CL) 2026-06-11

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual construction methods suffer from linear scaling limits, thereby hindering scalable reasoning generalization. This paper introduces RACES (Recursive Automated Composition for Environment Scaling), a framework that conceptualizes verifiable environments as composable building blocks that can be recursively assembled. The key insight is that when the codomain (output type) of one environment matches the domain (input type) of another, they can be automatically fused into a new verifiable environment, enabling recursive composition. RACES is implemented with 300 individual environments and defines a set of composition operators (\textsc{SEQUENTIAL}, \textsc{PARALLEL}, \textsc{SORT}, and \textsc{SELECT}) that induce diverse reasoning patterns. Extensive experiments show that RL training on these composite environments consistently enhances reasoning generalization. Specifically, RACES improves DeepSeek-R1-Distill-Qwen-14B by an average of 3.1 points (from 48.2 to 51.3) and boosts Qwen3-14B performance from 58.8 to 61.1 on six benchmarks, which are unseen during the construction of training environments. Moreover, RACES achieves performance comparable to training on 300 individual environments using only 50 base environments, demonstrating significant efficiency in environment utilization.

21.
arXiv (CS.CL) 2026-06-15

Learning to Hear Hesitation: Continual Learning for Disfluency-Aware ASR

Despite advances in large-scale Automatic Speech Recognition (ASR), disfluent speech remains challenging, as state-of-the-art systems are often optimized to omit disfluencies, leading to information loss and hallucinations. Prior work has focused on verbatim transcription and the integration of disfluency markers, but adapting models on limited datasets can lead to catastrophic forgetting of general-domain knowledge. We address this gap by leveraging continual learning (CL) with explicit disfluency tokens. We first introduce these tokens into a pretrained ASR model to establish stable token mechanisms, and then continue training on additional datasets with varying disfluency distributions. Through a detailed analysis of model dynamics during training, we identify a trade-off between marker learning and ASR performance, and a consistent cross-attention head mechanism shared across CL methods.

22.
arXiv (CS.CL) 2026-06-16

A Mechanistic Understanding of Pronoun Fidelity in LLMs

Faithful and robust pronoun use is important for fair and coherent generations, yet large language models largely fail when multiple referents use different pronouns. To study the interplay of reasoning, repetition, and bias in this task, prior work relies exclusively on behavioural approaches, which may not reflect a model's internal workings. Therefore, we provide a mechanistic, model-internal perspective on pronoun fidelity, testing whether three mechanisms – group entity binding (G), recency bias (R), and stereotypical bias (S) – are causally implemented across several SOTA language models. Using Boundless Distributed Alignment Search, we find all three coexist as causal subspaces distributed across network depth. No single mechanism fully explains model behaviour, but a combination of the three consistently accounts for 91-99.5%. An attention head analysis further reveals two competing copying routes; group binding and stereotype share a localized concept-level route that retrieves a bound occupation-pronoun unit, while recency uses a distributed token-level route that repeats surface forms. In sum, pronoun fidelity arises from competition between simultaneously active causal subspaces.

24.
arXiv (CS.CV) 2026-06-18

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising; the phase degrades significantly (dropping by $\approx 18\%$ from step 2 to step 50), whereas the magnitude remains relatively stable. Building on this insight, we propose PhaseLock, a training-free framework that preserves the valid motion priors from few-step inference throughout the denoising trajectory. Rather than relying on full-step inference for physical consistency, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Our approach effectively mitigates phase degradation, improving physical consistency by an average of 6.2 points across diverse models while largely maintaining visual fidelity, with negligible overhead ($1.06\times$ time, $1.02\times$ memory) and reduced reliance on expensive external guidance methods ($\sim5\times$ time). Project Page: https://dnwjddl.github.io/phaselock

25.
medRxiv (Medicine) 2026-06-24

Cognitive and Neuroimaging Biomarker Intra-Individual Variability in Alzheimer's Disease

Background Greater cognitive intra-individual variability (IIV) reflects increased heterogeneous performance across cognitive domains and has been linked to a higher risk of Alzheimer's disease (AD). However, it remains unclear whether cognitive IIV is linked to heterogeneous dispersion of regional AD pathology. Hence, we aimed to examine the association between cognitive IIV and AD neuroimaging biomarker IIV. Methods This study included participants with normal cognition (CN) and mild cognitive impairment (MCI) from the Alzheimer's Disease Neuroimaging Initiative. Cognitive IIV was computed as the within-person standard deviation of five domain-specific neuropsychological test z-scores. Four neuroimaging biomarker IIV metrics were similarly derived using regional amyloid-{beta} (n = 1,021), tau (n = 719), cortical thickness (n = 2,148), and combined amyloid-tau-neurodegeneration (ATN, n = 258). Associations between cognitive IIV and each biomarker IIV were evaluated using linear regression models, adjusted for relevant covariates. Results Higher cognitive IIV was associated with greater biomarker IIV across amyloid-{beta} ({beta} = 0.039, SE = 0.014, p = .006), tau ({beta} = 0.196, SE = 0.033, p < .001), cortical thinning ({beta} = 0.036, SE = 0.008, p < .001), and ATN ({beta} = 0.176, SE = 0.043, p < .001). Interaction analyses revealed that the associations of cognitive IIV with tau IIV, cortical thickness IIV, and ATN IIV were stronger in MCI than CN individuals. Significant interactions between cognitive IIV and biomarker positivity status showed that the effect with amyloid-{beta} IIV was attenuated in A- ({beta} = 0.004, SE = 0.014, p = .78) but that the effect with tau IIV remained robust even in T- individuals ({beta} = 0.088, SE = 0.022, p < .001). Conclusion Elevated cognitive IIV is associated with greater heterogeneity in cortical dispersion of AD-related pathology, particularly in prodromal AD and in the presence of abnormal pathology. As a novel measure that captures variation in topographical scattering of AD pathological burden across the cortex, AD biomarker IIV may offer research and clinical utility beyond evaluating absolute biomarker load or thresholds.