Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

Sharp analysis of linear ensemble sampling

arXiv:2602.08026v2 Announce Type: replace Abstract: We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=\Theta(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. This continuous-time lens appears particularly natural here: it yields an exact representation of the relevant discrete-time processes, and we do not know another route to a sharp ES bound.

02.
PLOS Computational Biology 2026-06-18

Ten simple rules for turning your qualifying exam into an NIH-style fellowship proposal: A guide for graduate students

by Courtney Peña-Lima, Cameron S. Bader, Brendan K. Ball, Troy C. Dildine, Mekhala V. Dissanayake, Iris van ‘t Erve, Albina Ibrayeva, Amy Nippert, M.K. Quinn, Chelse Spinner, Samuel Thompson, Antonio Tomasso, Crystal M. Botham Qualifying exams, often referred to as “quals” or candidacy exams, are an important milestone in doctoral programs. Although the style of quals varies greatly by program and institution, it is usually a proposal that requires students to develop research ideas as well as their scientific writing skills. Many quals are modeled after funding mechanisms that graduate students can apply to and on a topic that the student will pursue in their dissertation. This paper offers graduate students a step-by-step guide on how to turn their quals into a fellowship-style research proposal, using National Institutes of Health (NIH) mechanisms as a benchmark, as this is the norm within US research institutions. This paper will be most useful for students who have completed or are in the process of completing proposal-based qualifying exams, usually in the second year of a doctoral program.

03.
arXiv (CS.AI) 2026-06-11

Planning under Distribution Shifts with Causal POMDPs

arXiv:2602.23545v2 Announce Type: replace Abstract: In the real world, planning is often challenged by distribution shifts. As such, a model of the environment obtained under one set of conditions may no longer remain valid as the distribution of states or the environment dynamics change, which in turn causes previously learned strategies to fail. In this work, we propose a theoretical framework for planning under partial observability using Partially Observable Markov Decision Processes (POMDPs) formulated using causal knowledge. By representing shifts in the environment as interventions on this causal POMDP, the framework enables evaluating plans under hypothesized changes and actively identifying which components of the environment have been altered. We show how to maintain and update a belief over both the latent state and the underlying domain, and we prove that the value function remains piecewise linear and convex (PWLC) in this augmented belief space. Preservation of PWLC under distribution shifts has the advantage of maintaining the tractability of planning via $\alpha$-vector-based POMDP methods.

04.
arXiv (quant-ph) 2026-06-19

Majorana bound states in a hybrid Kitaev ladder with long-range pairing

arXiv:2606.19963v1 Announce Type: new Abstract: We investigate an inter-leg coupled hybrid Kitaev ladder composed of two parallel superconducting chains with distinct pairing interactions. The upper chain of the ladder hosts conventional $p$-wave pairing, while the lower chain exhibits long-range pairing that decays algebraically with distance. We demonstrate that the mutual influence of long-range pairing exponent, chemical potential, and inter-leg coupling strength gives rise to a rich topological phase diagram characterized by multiple Majorana zero modes and massive Dirac modes. In particular, we show that the inter-leg coupling renormalizes the effective energy scales, leading to a systematic shift of the topological phase boundaries and enabling controlled tuning of the Majorana modes. Furthermore, we identify a transition from a two Majorana zero mode phase to a phase encapsulating four Majorana zero modes, as the long-range pairing exponent is varied. This transition is accompanied by a crossover regime in which Majorana zero modes coexist with massive Dirac modes, reflecting hybridization between edge and bulk excitations. This ladder thus provides a minimal and attractive platform for realizing the impact of a long-range pairing on topological phases. Our results highlight the potential of long-range hybrid systems for engineering tunable topological states relevant for quantum information applications.

05.
medRxiv (Medicine) 2026-06-11

The impact of pre-stroke statin use on baseline corrected infarct volume and collateral perfusion

Stroke is a leading cause of disability and mortality worldwide, with ischaemic stroke the most prevalent type. Statins, used for cholesterol management, have demonstrated benefits in reducing stroke risk and improving outcomes in preclinical studies. However, the impact of pre-stroke statin use on stroke outcomes remain inconsistent. In this study, we aim to evaluate whether pre-stroke statin use is associated with greater volume of salvaged tissue and improved cerebral collateral perfusion. A retrospective analysis was conducted using data from 281 patients presenting with acute ischemic stroke to the John Hunter Hospital between May 2015 and May 2020. Patients were grouped based on pre-stroke statin use, and clinical variables, including infarct volume and collateral perfusion, were assessed. The primary outcome was salvage volume derived from baseline perfusion lesion volume minus infarct volume at follow-up. Collateral perfusion was measured by the hypoperfusion volume defined by delay time (DT)>6 seconds divided by the hypoperfusion volume defined by DT >2 seconds. Patients on statins at admission were significantly older and had more comorbidities. No significant association was found between pre-stroke statin use and salvage volume or collateral perfusion after adjusting for covariates. Larger initial infarct core was a significant predictor of salvage volume due to larger salvageable tissue volume at baseline. These findings indicate that pre-morbid statin use is not associated with larger salvage volume or improved cerebral collateral perfusion.

06.
arXiv (CS.LG) 2026-06-17

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

arXiv:2604.17616v3 Announce Type: replace Abstract: Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

07.
arXiv (CS.LG) 2026-06-12

Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score

arXiv:2606.13300v1 Announce Type: new Abstract: We introduce the Trajectory-based Quantization Sensitivity Score (TQS), a metric that reframes post-training quantization (PTQ) through the lens of dynamical-systems stability. By modeling the network's rollout as a discrete-time dynamical system, TQS characterizes how quantization-induced errors propagate and amplify over the rollout horizon. Unlike conventional PTQ methods, where sensitivity analysis is often coupled to the quantization procedure, TQS enables a priori sensitivity estimation decoupled from quantizer selection and bit-width assignment. This separation allows for quantization budget planning even for black-box or compiled networks with fused operators. Building on this, we present TQS-PTQ, a flexible mixed-precision framework that requires no calibration data or costly second-order approximations. Our experiments show that a dynamical-systems perspective provides a robust, high-performing pathway for low-precision deployment in resource-constrained settings.

08.
arXiv (CS.AI) 2026-06-16

Harnessing cortical geometry, wiring, and function as inductive biases for recurrent neural networks

arXiv:2606.14975v1 Announce Type: cross Abstract: How the wiring and functional organization of cortex shape recurrent computation remains a central question in both neuroscience and machine learning. Here, we leverage data released through the Machine Intelligence from Cortical Networks (MICrONS) program–a functional connectomics resource spanning multiple areas of mouse visual cortex, in which dense calcium imaging is co-registered with high-resolution electron microscopy reconstruction from the same animal–to build biologically grounded recurrent neural networks. Using neuronal spatial coordinates, anatomical connectivity, and function-derived relationships from nearly 12,000 coregistered excitatory neurons, we initialize recurrent weights and impose communication-aware spatial constraints during learning. Across three cognitive decision-making tasks, networks constrained by cortical structure and function consistently outperform baseline and partially constrained models. Functional weight initialization provides the largest gain, while real spatial embedding yields robust additional improvements across conditions. These biologically grounded networks also develop low-entropy, modular, and small-world organization, and retain strong performance even when recurrence is restricted to positive weights. Together, our results show that the machinery of cortex–its geometry, wiring, and functional structure–can be harnessed as a powerful inductive basis for building recurrent networks that learn more effectively while converging toward key organizational principles of biological computation.

09.
arXiv (CS.AI) 2026-06-15

Efficient Temporal Modeling for Mobile Sleep Staging via Lightweight Random Attention

arXiv:2606.13694v1 Announce Type: cross Abstract: Mobile sleep staging serves as a foundational infrastructure for in-home sleep monitoring and closed-loop modulation. But existing sequential models such as RNNs and Transformers are computationally expensive for mobile deployment. In this paper, we propose Random Attention (RA), a lightweight temporal modeling module based on fixed random projections, which replaces learnable sequence modeling with similarity-based aggregation. RA introduces little additional parameters beyond the epoch encoder while enabling effective temporal smoothing. We further provide a theoretical interpretation via the Random Attention Prior Kernel (RAPK), which decomposes RA into a global smoothing term and a feature similarity term, offering an interpretable view of temporal sleep structure. Experiments on Sleep-EDF-20 and Sleep-EDF-78 show that RA consistently improves epoch-wise baselines by 1-3\% in accuracy and F1 score, while achieving competitive performance compared with LSTM, GRU, and Transformer models. RA also demonstrates strong generalization across different backbone encoders and improved robustness over conventional temporal smoothing methods. These results indicate that efficient sleep staging can be achieved through lightweight similarity-based temporal aggregation, making RA suitable for real-time wearable applications.

10.
arXiv (CS.LG) 2026-06-11

NARRAS: Edge-Triggered Distributed Inference for CSI-Based Localization in Vehicular IoT Networks

arXiv:2606.11914v1 Announce Type: cross Abstract: CSI-based localization with spatially distributed antenna arrays exposes a basic resource trade-off. Each array can provide a rich view of the channel, but forwarding observations from all arrays to a fusion center is wasteful when only a few carry useful information, and the shared uplink supports only a limited number of simultaneous transmissions. We let each array decide locally whether its current observation is worth reporting, subject to a budget on the average number of active transmitters. We refer to this abstraction as Edge-Triggered Distributed Inference (ETDI). It captures a broader class of task-oriented communication problems where resource-constrained devices share an access channel for a common inference task. We instantiate ETDI for CSI-based localization, a common scenario in vehicular IoT networks. Spatially distributed remote antenna arrays (RAAs) encode local channel state information (CSI) from user equipment (UE) transmissions into latent features, and the fusion center estimates the UE position from the subset of reported features. We propose NARRAS, a decentralized reporting policy in which each RAA combines a recurrent summary of its recent observations with a memory of the last latent it transmitted. Training controls an explicit activity budget through differentiable activity penalties and validation-calibrated deterministic thresholds, and uses channel-chart regularization to shape the latent geometry. Experiments show that, at comparable uplink activity, NARRAS improves localization accuracy over learned and heuristic sparse-reporting strategies, while dense full-report models remain useful budget-free references. In low-activity regimes, chart regularization further reduces high-percentile localization errors, suggesting that geometry-aware latent representations are more robust under sparse reporting.

11.
arXiv (CS.CL) 2026-06-16

Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens

Diffusion Large Language Models (dLLMs) offer a promising avenue for parallel generation but face a trade-off between decoding speed and quality. While revocable decoding strategies attempt to mitigate errors by verifying and remasking tokens, they typically operate within a mixed-quality context. This leads to two critical failures: Error Propagation, where new tokens absorb toxic information from erroneous context, and Local Error Reinforcement, where errors mutually reinforce each other to evade detection. To alleviate these challenges, we propose ASRD (Anchor Supervised Revocable Decoding), a training-free framework that operates within the embedding space. ASRD explicitly decouples the decoding context into trusted Anchor Tokens, which are identified via temporal consistency, and uncertain candidates. Leveraging a dynamic Anchor Tokens Cache, we introduce two complementary mechanisms: (1) Anchor-Guided Generation, which injects entropy-weighted anchor signals into masked positions to implicitly rectify attention toward the reliable global skeleton; and (2) Anchor-Perturbed Verification, which applies orthogonal perturbations to uncertain candidate tokens, destabilizing and remasking errors driven by fragile local consensus. Extensive experiments on math and coding benchmarks demonstrate that ASRD outperforms recent remasking baselines, achieving accuracy improvements of up to 6.4\% while accelerating inference throughput by up to 7.2$\times$.

12.
arXiv (CS.CL) 2026-06-12

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents – large language models augmented with search tools – have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models can achieve high scores through fact recall rather than genuine retrieval, obscuring true browsing competence via reasoning shortcuts. In this paper, we introduce EvoBrowseComp, an evolving benchmark of 400 English and 400 Chinese contamination-free complex questions synthesized via live-web traversal. To collect these questions, we design a three-agent collaborative framework: (1) a QA synthesis agent that retrieves fresh knowledge from the live web to synthesize QA pairs; (2) an information filtering agent that filters retrieved knowledge in terms of credibility and popularity to block parametric shortcuts; and (3) a high-level guidance agent that formalizes questions into reasoning graphs to reduce logical redundancy and shortcuts in synthesized QA pairs. Because the framework supports fully automated synthesis, EvoBrowseComp can be regularly updated to prevent data contamination and maintain temporal freshness. Extensive experiments confirm its great difficulty, requiring broad horizontal search. It establishes a scalable paradigm for auto-updatable, high-difficulty benchmarking that keeps pace with both evolving world knowledge and advancing agent capabilities.

13.
arXiv (CS.LG) 2026-06-19

A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models

arXiv:2604.13240v2 Announce Type: replace-cross Abstract: Mapping the spatial distribution of species is essential for conservation policy and invasive species management. Species distribution models (SDMs) are the primary tools for this task, serving two purposes: achieving robust predictive performance while providing ecological insights into the driving factors of distribution. However, the increasing complexity of deep learning SDMs has made extracting these insights more challenging. To reconcile these objectives, we propose the first implementation of concept-based Explainable AI (XAI) for SDMs. We leverage the Robust TCAV (Testing with Concept Activation Vectors) methodology to quantify the influence of landscape concepts on model predictions. To enable this, we provide a new open-access landscape concept dataset derived from high-resolution multispectral and LiDAR drone imagery. It includes 653 patches across 15 distinct landscape concepts and 1,450 random reference patches, designed to suit a wide range of species. We demonstrate this approach through a case study of two aquatic insects, Plecoptera and Trichoptera, using two Convolutional Neural Networks and one Vision Transformer. Results show that concept-based XAI helps validate SDMs against expert knowledge while uncovering novel associations that generate new ecological hypotheses. Robust TCAV also provides landscape-level information, useful for policy-making and land management. Code and datasets are publicly available.

14.
arXiv (CS.LG) 2026-06-11

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

arXiv:2506.08473v4 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment direction - defined by weight differences between aligned (safe) and unaligned models - rapidly compromise model safety. In contrast, updates along the alignment direction largely preserve it, revealing the parameter space as a "narrow safety basin". To address this, we propose AsFT (Anchoring Safety in Fine-Tuning) to maintain safety by explicitly constraining update directions during fine-tuning. By penalizing updates orthogonal to the alignment direction, AsFT effectively constrains the model within the "narrow safety basin," thus preserving its inherent safety. Extensive experiments on multiple datasets and models show that AsFT reduces harmful behaviors by up to 7.60%, improves task performance by 3.44%, and consistently outperforms existing methods across multiple tasks.

15.
arXiv (CS.AI) 2026-06-18

A Variational Framework for LLM Generator-Regulator Games

作者:

arXiv:2606.18424v1 Announce Type: cross Abstract: This paper develops a variational framework for regulated language generation. Starting from autoregressive token sampling, we derive the induced distribution over complete messages and relate it to an entropy-regularized Gibbs law. Regulation is modeled as an optimal discriminator whose convex-dual value is an f-divergence, and the generator-regulator interaction is formulated as a saddle-point problem. The framework applies to moderation, censorship, AI deception detection, compliance auditing, phishing defense, and manipulation control, where regulation concerns a distribution over possible messages rather than a single output. The equilibrium clarifies the tradeoff among utility, entropy, regulatory alignment, and finite-length detectability. Two finite-vocabulary case studies, censorship filtering and phishing defense, illustrate how the theory can be evaluated through utility, entropy, divergence, receiver-side scores, and detection probability.

16.
medRxiv (Medicine) 2026-06-15

Identifying the risk profile of anemia subtypes and hemodynamic obstetric complications in relation to peripartum cardiomyopathy

Background: Peripartum cardiomyopathy (PPCM) is a leading cause of maternal mortality worldwide, with worse outcomes associated with African Ancestry and delayed presentation. However, the mechanisms underlying PPCM are incompletely understood. Objective: Use a large, nationwide cohort to explore associations between PPCM and underexplored perinatal risk factors and complications of childbirth. Methods: Public hospital discharge data were obtained from eleven U.S. states between 2003-2019. Delivery hospitalizations, patient characteristics and obstetric complications were identified using ICD-9 and -10 CM codes. Only cases with unique patient identifiers enabling readmission analysis were included. The primary outcome was incident PPCM coded between 30 days antepartum and 150 days postpartum. Results: Of 7,424,916 delivering patients, 5,488 patients were diagnosed with PPCM. Patients with PPCM had higher rates of anemia, anemia of chronic disease (ACD), iron deficiency anemia (IDA), sickle cell disease (SCD), sickle cell trait (SCT), red blood cell (RBC) transfusion, and postpartum hemorrhage (PPH) (p

17.
arXiv (CS.AI) 2026-06-16

Phys-JEPA: Physics-Informed Latent World Models for Multivariate Time-Series Forecasting

arXiv:2606.16076v1 Announce Type: cross Abstract: Multivariate forecasting in physical systems requires models that predict coupled temporal variables while preserving meaningful state evolution. Deep forecasters can fit temporal correlations, and physics-informed models can regularize predictions with scientific constraints, but these directions are often connected only at the decoded-output level. As a result, the hidden predictive state that generates future trajectories may remain statistically useful but physically unstructured. We introduce Phys-JEPA, a physics-informed joint-embedding predictive architecture for multivariate time-series forecasting. Phys-JEPA learns a latent world model in which predictive states are decomposed into physical and residual components, and physical consistency is imposed directly on latent states and latent transitions rather than only on decoded forecasts. This formulation uses known physical variables to organize the representation space while retaining residual capacity for unresolved dynamics. On Jena Climate 2009–2016, Phys-JEPA reduces aggregate MSE from 0.12482 to 0.12273 and temperature MSE from 0.01892 to 0.01831 at H=24. On Traffic, full Phys-JEPA improves aggregate MSE over the supervised baseline across all tested horizons, reducing H=192 MSE from 0.800784 to 0.773873. On Electricity, the best variant depends on horizon: static latent consistency is strongest at H=24 and H=48, while full Phys-JEPA gives the best aggregate and target-variable MSE at H=192. These initial results suggest that moving physics-informed learning from output space to latent predictive state space is a promising direction for interpretable temporal world models.

19.
arXiv (CS.AI) 2026-06-19

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

arXiv:2606.20356v1 Announce Type: cross Abstract: In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

20.
arXiv (CS.AI) 2026-06-17

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

arXiv:2606.18206v1 Announce Type: new Abstract: Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The number of effective layers reached by looping determines the quality of the solution these models find. Like deep architectures, looped architectures are prone to a signal propagation problem induced by depth as the halting decision is postponed. In this paper, we address this signal propagation issue using pre-norm layers and residual scaling. Building on these architectural modifications, we propose FPRM, a Transformer-based Fixed-Point Reasoning Model that uses fixed-point convergence as an end-to-end halting mechanism in a looped architecture. We show that fixed-point halting allows FPRM to adapt its compute to task difficulty. FPRM is effective on common reasoning benchmarks, namely Sudoku, Maze, state-tracking, and ARC-AGI.

21.
arXiv (quant-ph) 2026-06-11

Dissociative recombination and ion-pair formation in $\mathrm{HeH^+}$ isotopologues: A time-dependent wave-packet study including rotational coupling

arXiv:2606.11352v1 Announce Type: cross Abstract: We present a comprehensive theoretical investigation of dissociative recombination (DR) and resonant ion-pair (RIP) formation in $\mathrm{HeH^+}$ isotopologues using time-dependent wave-packet propagation methods. Nuclear dynamics are treated on a set of 23 coupled electronic states, including $^2\Sigma$, $^2\Pi$, and $^2\Delta$ symmetries, in both adiabatic and strictly diabatic representations, with rotational couplings explicitly included. Reaction cross sections are computed over collision energies ranging from 0 to 50 eV. The results reveal that inclusion of a large manifold of resonant states and rotational couplings significantly enhances the DR cross section relative to earlier theoretical studies. In the diabatic representation, $^2\Sigma$ states dominate the recombination dynamics, while in the adiabatic representation, $^2\Pi$ and $^2\Delta$ states contribute significantly at low collision energies. For RIP formation, two different diabatization schemes yield systematically larger cross sections than previous models, highlighting the sensitivity of ion-pair production to electronic coupling structure. Isotopic effects are examined, showing a clear inverse dependence of cross section magnitude on reduced mass. The present results underscore the importance of multi-state coupling and nonadiabatic effects in accurately describing electron-molecule collision processes in primordial and astrophysical plasmas.

22.
medRxiv (Medicine) 2026-06-18

Human Intuition vs. Computational Precision: Neurologists, Feature-based Models, and Deep Learning for Stroke Prognosis

Background: Prognostication in large vessel occlusion (LVO) stroke remains challenging. Although several prognostic models exist, their comparison to clinician performance, human-model interaction, and specific sources of human bias remain poorly understood. Methods: Using pre-treatment clinical and CT data from the MR CLEAN trial (n=500), six neurologists predicted three-month modified Rankin Scale (mRS) scores for 40 patients, both unaided and assisted by a validated feature-based model (MR PREDICTS). Human performance was benchmarked against MR PREDICTS and a multimodal, interpretable deep learning (DL) approach using raw imaging data. We explicitly assessed neurologists? ability to estimate model-required imaging features and identified systematic human biases. Models were additionally validated in a larger MR CLEAN trial cohort (n=404). Results: For predicting the full mRS distribution, standalone models achieved good ordinal agreement (MR PREDICTS quadratic weighted kappa (QWK) 0.51 [0.24 to 0.70]; DL model 0.49 [0.25 to 0.67]), significantly outperforming unaided neurologists (QWK 0.27 [0.10, 0.42]). Neurologists showed systematic overoptimism, predicting lower mRS scores than observed. Furthermore, there was poor accuracy in extracting imaging features. Raters? ASPECTS predictions deviated by 3.4 points from the confirmed scores, and collateral score accuracy was 44.6%. However, for predicting binary mRS (0-2 vs. 3-6), accuracy was comparable between unaided neurologists (64.17% [55.42% to 72.92%]) and models (MR PREDICTS 67.50% [52.50% to 82.50%]; DL model 63.16% [47.37% to 78.95%]). Model-assistance modestly improved and harmonized neurologists? predictions (QWK 0.41 [0.22 to 0.55]; binary accuracy 68.75% [58.33% to 78.34%]. Model performance remained robust in the larger cohort. Conclusions: Multimodal prognostic models outperform clinicians in predicting the full range of mRS outcomes, while human error in imaging assessment and systematic optimism bias are primary drivers of prognostic inaccuracy. End-to-end DL models eliminate human-input variability and hold strong potential as an automated second opinion to support prognostication and decision-making in acute LVO stroke.

23.
arXiv (CS.AI) 2026-06-16

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

arXiv:2605.09169v2 Announce Type: replace-cross Abstract: A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout $S = |W_{out} W_{in}|$, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at $p < 10^{-5}$. We package the protocol used to test that claim – standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics ($do(X=c)$, soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms – as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard $do(X=c)$ interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger – the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.

24.
arXiv (CS.LG) 2026-06-15

When Language Representations Interact: Separability and Cross-Lingual Effects in LLMs

arXiv:2606.14347v1 Announce Type: new Abstract: Large language models exhibit strong multilingual capabilities, however, their internal representations are difficult to interpret. Understanding these interactions is important for ensuring reliable behavior in multilingual systems. Recent work has shown that causal-geometric structure can explain how certain concepts are encoded as approximately linear and separable directions, but whether this framework extends to multilingual models, where language identity is correlated and hierarchical, is underexplored. We apply causal-geometric analysis to multilingual LLMs, studying 28 bilingual contrasts across three models, allowing us to analyze when languages behave as approximately independent factors and when structured dependencies persist. We find evidence that language concepts admit stable linear representations that are largely separable under a covariance-adjusted (causal) inner product, with structured deviations reflecting linguistic similarity. Moreover, languages within the same family (such as Germanic or Romance) exhibit a simplex-like geometric structure, suggesting hierarchical organization. These results extend causal-geometric interpretability to multilingual settings and provide insight into how separability and similarity may exist in multilingual LLM representations, motivating interpretability analyses that diagnose when and how structured dependencies between concepts can be anticipated. This has implications for trustworthy deployment, as residual structure between languages may lead to unintended cross-lingual effects when models are monitored or intervened upon.

25.
medRxiv (Medicine) 2026-06-22

Effect of Lowering the Drink-Driving Blood Alcohol Limit in Scotland on Road Traffic Crashes: a Synthetic Difference-in-Differences Study

Objective: To evaluate the road safety impact arising from Scotlands 2014 reduction in the legal blood alcohol concentration (BAC) limit for drivers, and to assess whether the effect of the reform varied across different spatial contexts. Design: A quasi-experimental statistical longitudinal study using a Synthetic Difference-in-Differences (SDID) approach. Setting: Small-area panel data for Great Britain, with areas (Middle-layer Super Output Areas, MSOAs, in England and Wales and Intermediate Zones, IZs, in Scotland) classed into control and treatment groups according to whether they were exposed to Scotlands BAC reform. The control and treatment groups comprise 7088 spatial units in England and Wales and 852 spatial units in Scotland, respectively, observed over the period 2008-2019. Participants: The study primarily analyses police-reported road traffic collision data from the UK Department for Transports STATS19 system. Data were analysed at the MSOA/IZ level. This is a secondary dataset, and we therefore did not involve patients or the public in formulating the research question, determining outcome measures, or designing and conducting the study. Main Outcome Measures: The main outcome measures were log-transformed rates of total road traffic crashes, and (weekend) night-time crashes (22:00-04:00) per 100,000 population. The latter is used as a proxy measure for drunk driving. Results: Our results indicate that the reduction in the legal BAC limit led to statistically significant declines in road traffic crash rates. Aggregate estimates suggest reductions of 12.0% (95% confidence interval (CI): [-13.7%, -10.3%]) in total crashes, 15.6% (95% CI: [-20.7%, -10.2%]) in night-time crashes, and 12.4% (95% CI: [-16.7%, -7.9%]) in weekend night-time crashes. We also find substantial heterogeneity in treatment effects across spatial contexts. Effects were strongest in rural and less densely populated areas, where reductions exceeded 16% (95% CI: [-18.7%, -13.9%]) for total crashes and reached up to 29.6% (95% CI: [-35.8%, -22.8%]) for night-time and 21.4% (95% CI: [-28.3%, -13.9%]) for weekend night-time crashes. Moderate but statistically significant effects were also observed in dense urban areas, whereas effects in suburban and transitional areas were smaller and not statistically significant. Conclusions: Our analysis suggests that lowering the legal BAC limit in Scotland led to meaningful reductions in road traffic crashes, particularly during higher-risk periods and in rural areas. The findings further suggest that the effectiveness of BAC regulation may vary across local contexts, highlighting the importance of accounting for spatial heterogeneity when evaluating road safety policies.