Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-17

SCC-Loc: A Unified Semantic Cascade Consensus Framework for UAV Thermal Geo-Localization

Cross-modal Thermal Geo-localization (TG) provides a robust, all-weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, profound thermal-visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse-to-fine registration. To dismantle this bottleneck, we propose SCC-Loc, a unified Semantic-Cascade-Consensus localization framework. By sharing a single DINOv2 backbone across global retrieval and MINIMA$_{RoMa}$ matching, it minimizes memory footprint and achieves zero-shot, highly accurate absolute position estimation. Specifically, we tackle modality ambiguity by introducing three cohesive components. First, we design the Semantic-Guided Viewport Alignment (SGVA) module to adaptively optimize satellite crop regions, effectively correcting initial spatial deviations. Second, we develop the Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF) mechanism to explicitly enforce geometric consistency, thereby eradicating dense cross-modal outliers. Finally, we propose the Consensus-Driven Reliability-Aware Position Selection (CD-RAPS) strategy to derive the optimal solution through a synergy of physically constrained pose optimization. To address data scarcity, we construct Thermal-UAV, a comprehensive dataset providing 11,890 diverse thermal queries referenced against a large-scale satellite ortho-photo and corresponding spatially aligned Digital Surface Model (DSM). Extensive experiments demonstrate that SCC-Loc establishes a new state-of-the-art, suppressing the mean localization error to 9.37 m and providing a 7.6-fold accuracy improvement within a strict 5-m threshold over the strongest baseline. Code and dataset are available at https://github.com/FloralHercules/SCC-Loc.

02.
arXiv (CS.LG) 2026-06-18

JourneyFormer: Encoding Airbnb Guest Journey with Sequence Modeling

arXiv:2606.19108v1 Announce Type: new Abstract: Sequence modeling has become increasingly popular in recommendation and ranking algorithms, owing to its capacity to model users' historical behaviors and infer user intentions. Despite its theoretical simplicity, the practical deployment of a sequence model in production is non-trivial due to complexity of the sequence and sparse labels. For example, in Airbnb, guest sequences are often long, exploratory and complex, and we focus on booking labels, which are sparse. As such, we are often required to make various design decisions regarding data and modeling to strike a balance between effectiveness and scalability. This work delved into these production challenges and deployed JourneyFormer, a sequence modeling solution for search ranking at Airbnb. We detail crucial design considerations, covering aspects such as guest event selection, ID embeddings, model architecture, and label attribution. Additionally, we describe several tailored strategies to accelerate model training and inference. JourneyFormer has been successfully deployed within Airbnb's production, where its effectiveness and impact have been evidenced not only by improved offline ranking metrics but also by significant gains in key business metrics through online A/B testing across 2 production surfaces.

03.
arXiv (CS.CL) 2026-06-17

TACOMORE: Exploring a replicable prompting protocol for LLM-assisted corpus analysis

As corpus linguistics continues to scale, researchers are facing a growing methodological bottleneck: while computational tools can easily count billions of words, the qualitative interpretation of these data remains a slow and labor-intensive human task. Large Language Models (LLMs) offer a promising way to automate this process, yet their integration into the field is often hindered by concerns over black-box unpredictability and a lack of replicability. This study introduces TACOMORE, a structured prompting framework designed to transform ad-hoc AI interactions into a standardized linguistic protocol. Built upon four foundational principles (Task, Context, Model, and Replicability), the framework guides LLMs to move beyond generic probability prediction to anchoring their reasoning in the specific co-occurrence patterns of a target corpus. We applied this framework to three core corpus tasks, i.e., the analysis of keywords, collocates, and concordances, using an open corpus of COVID-19 research abstracts. After testing three LLMs, we found that while structured prompting improves accuracy and replicability, inherent limitations regarding hallucination persist. This research offers a critical lens into the role of LLMs in corpus linguistics, highlighting their potential as complementary tools while emphasizing the irreplaceable role of human validation.

04.
arXiv (CS.AI) 2026-06-16

Multi-Grade Deep Learning for Partial Differential Equations with Applications to the Burgers Equation

arXiv:2309.07401v2 Announce Type: replace-cross Abstract: Deep neural networks (DNNs) show great promise for solving partial differential equations (PDEs), but their deep architectures introduce complex, large-scale, non-convex optimization challenges. Nonlinear PDEs, like the viscous Burgers' equation, compound these difficulties due to steep gradients and shock-like solutions. To address this, we propose a two-stage multi-grade deep learning (TS-MGDL) method. In the first stage, shallow networks are trained progressively grade by grade to fit the target function from low- to high-frequency components; previously learned grades are frozen, and each new residual block is trained solely to minimize the remaining approximation error. The second stage unfreezes and retrains selected layers using the first-stage network as initialization, achieving an interpretable, stable hierarchical refinement while mitigating optimization complexity. Furthermore, we theoretically prove that each grade and stage in TS-MGDL monotonically reduces the loss function under an appropriate optimization strategy. Numerical experiments on 1D, 2D, and 3D viscous Burgers' equations demonstrate that TS-MGDL significantly outperforms single-grade learning (SGL), reducing predictive errors by up to a factor of 60.

05.
arXiv (quant-ph) 2026-06-15

Trap-Quenched Matter-Wave Optics for Dual Species Lensing

arXiv:2606.14577v1 Announce Type: cross Abstract: Dual-species atom interferometry in space promises precise tests of the Universality of Free Fall (UFF), with a sensitivity that grows quadratically with the extended interrogation time accessible in weightlessness. These tests demand exquisite control over the expansion energies of both condensed sources as well as over their differential center-of-mass dynamics. We propose a trap-quenched collimation technique featuring in-trap excitations of collective modes compatible with state-of-the-art atom-chip setups. Using NASA's Cold Atom Laboratory aboard the International Space Station, we demonstrate it on a single-species $^{87}$Rb condensate. By controlling the center-of-mass release dynamics we observe free expansion times up to 700 ms and measure a two-dimensional expansion energy of $k_B \cdot 78\pm 9 \;\mathrm{pK}$ in the imaging plane. A detailed model of the magnetically-induced dynamics indicates that this corresponds to a two-dimensional expansion energy of about $k_B \cdot 15^{+12}_{-5}\; \mathrm{pK}$ along two of the condensate's eigenaxes. Finally, we theoretically study this trap-quenched collimation scheme for a $^{41}$K-$^{87}$Rb mixture, predicting a simultaneous collimation that meets the expansion energy requirements for a state-of-the-art UFF test at the $10^{-15}$ accuracy level.

06.
arXiv (CS.CL) 2026-06-19

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Large language models (LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their causal impact on reasoning tasks and uses these head-level scores to guide fine-grained weight pruning. For each attention head, CAP estimates the expected performance degradation when the head is masked during forward passes on a small calibration set of reasoning problems. These causal scores are then converted into weight-level importance values for the corresponding projection matrices. Unlike magnitude-only or activation-based criteria, CAP's interventional measurement directly captures each head's functional contribution, yielding relative accuracy gains of up to 61% over Wanda on ARC-Challenge at 20% sparsity. We evaluate CAP on GSM8K, StrategyQA, and ARC-Challenge using Llama-3-8B-Instruct and Mistral-7B-Instruct at 10%, 20%, and 50% sparsity. At moderate sparsity (10-20%), CAP improves over Wanda in most model-benchmark configurations. with especially large gains on ARC-Challenge for Llama-3. Our results suggest that attention-head-level causal attribution can better preserve reasoning performance on downstream benchmarks than correlational pruning criteria at equivalent sparsity, while remaining limited by coarse MLP attribution at 50% sparsity.

07.
arXiv (CS.CL) 2026-06-16

Simplifying the Modeling of Arbitrary Conditionals in Natural Language

Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals – e.g., a block of text conditioned on past and future tokens. Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals – including past, future, and mixed contexts – within a single forward pass. Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance.

08.
arXiv (CS.LG) 2026-06-16

Tight Bounds for Logistic Regression with Large Stepsize Gradient Descent in Low Dimension

arXiv:2602.12471v2 Announce Type: replace Abstract: We consider the optimization problem of minimizing the logistic loss with gradient descent to train a linear model for binary classification with separable data. With a budget of $T$ iterations, it was recently shown that an accelerated $1/T^2$ rate is possible by choosing a large stepsize $\eta = \Theta(\gamma^2 T)$ (where $\gamma$ is the dataset's margin) despite the resulting non-monotonicity of the loss. In this paper, we provide a tighter analysis of gradient descent for this problem when the data is two-dimensional: we show that GD with a sufficiently large learning rate $\eta$ finds a point with loss smaller than $\mathcal{O}(1/(\eta \gamma^2 T))$, as long as $T \geq \Omega(n/\gamma + 1/\gamma^2)$, where $n$ is the dataset size. Our improved rate comes from a tighter bound on the time $\tau$ that it takes for GD to transition from unstable (non-monotonic loss) to stable (monotonic loss), via a fine-grained analysis of the oscillatory dynamics of GD in the subspace orthogonal to the max-margin classifier. We also provide a lower bound of $\tau$ matching our upper bound up to logarithmic factors, showing that our analysis is tight.

09.
arXiv (quant-ph) 2026-06-19

Matrix-product state skeletons in Onsager-integrable quantum chains

arXiv:2511.07212v2 Announce Type: replace Abstract: Matrix-product state (MPS) skeletons are connected networks of Hamiltonians with exact MPS ground states that underlie a phase diagram. Such skeletons have previously been found in classes of free-fermion models. For the translation-invariant BDI and AIII free-fermion classes, it has been shown that the underlying skeleton is dense, giving an analytic approach to MPS approximation of ground states anywhere in the class. In this paper, we partially expose the skeleton in certain interacting spin chains: the $N$-state Onsager-integrable chiral clock families. We construct MPS that form a dense MPS skeleton in the gapped regions surrounding a sequence of fixed-point Hamiltonians (the generators of the Onsager algebra). Outside these gapped regions, these MPS remain eigenstates, but no longer give the many-body ground state. Rather, they are ground states in particular sectors of the spectrum. Our methods also allow us to find further MPS eigenstates; these correspond to low-lying excited states within the aforementioned gapped regions. This set of MPS excited states goes beyond the previous analysis of ground states on the $N=2$ free-fermion MPS skeleton. As an application of our results, we find a closed form for the disorder parameter in a family of interacting models. Finally, we remark that many of our results use only the Onsager algebra and are not specific to the chiral clock model representation.

10.
arXiv (CS.AI) 2026-06-19

ScaleWoB: Guiding GUI Agents with Coding Agents via Large-Scale Environmental Synthesis

arXiv:2605.25160v2 Announce Type: replace Abstract: GUI agents powered by large language models are advancing rapidly, creating urgent needs for evaluation and training based on realistic environments. However, directly doing so in real-world environments introduces some challenges that cannot be overlooked. Real-world environments are complex and uncontrollable, making it difficult to construct verifiable rewards and to save or reset states. Existing works prioritize reproducibility but are often limited to open-source apps or file-operation tasks for reliable reward building, leaving a persistent gap from real-world usage. Furthermore, relying on virtual machines or docker images demand high resource requirements and suffer from slow response speeds, which limit the efficiency. We present \sys, a framework that could produce high-fidelity synthesized interactive environments for GUI agents across platforms with verifiable rewards. These environments behave as backend-free webpages accessible via URL, requiring near-zero setup and low resource cost, making the approach suitable for both large-scale evaluation and downstream agent training. We support multiple GUI platforms including mobile, desktop, and automotive/in-vehicle interfaces based on the same pipeline, covering 100+ environments and 1000+ verifiable tasks. Among them, 120 challenging tasks across 63 simulated mobile applications are released as a fully synthesized mobile GUI agent benchmark. Experiment results on five state-of-the-art mobile GUI agents reveal substantial headroom – the average success rate is only 27.92\%, dropping to 17.82\% on long-horizon subset – while humans reach 92.08\%. A comparison against real-world sample tasks shows that assessments made in our synthetic environments generalize to real apps. The project website is at https://scalewob.github.io.

11.
arXiv (CS.LG) 2026-06-16

Learning Hybrid Biophysical Neuron Models with Neural ODEs

arXiv:2606.16693v1 Announce Type: cross Abstract: Biophysical neuron models link measurements of neural activity to underlying cellular mechanisms. Yet, a central challenge is that the kinetics of many ion channels are poorly characterized, and practical simplifications – omitting channels or reducing morphological detail – introduce systematic gaps between model and biology. Bridging these gaps requires approaches that can flexibly discover unmodeled dynamics while preserving mechanistic interpretability. Here, we introduce a hybrid modeling framework that embeds neural ordinary differential equations into conductance-based biophysical models to capture unknown currents or mis-specified channel kinetics. By parameterizing the neural ODE in terms of voltage-dependent steady-state and time-constant functions, we recover interpretable gating dynamics directly from voltage recordings without assuming a functional form. We show that the hybrid model fits the gating kinetics of 2400 ion channel models and recovers unknown gating dynamics from single current-clamp recordings, generalizing to out-of-distribution stimulus regimes under realistic inputs and parameter misspecification. We also use our method to reduce a multicompartment model of a cortical neuron into a single-compartment hybrid model with a learned axial current, yielding up to an order of magnitude lower computational cost. Together, our results establish a plug-and-play framework for selectively replacing unknown components of conductance-based models with neural ODEs while preserving their mechanistic structure.

12.
arXiv (CS.AI) 2026-06-16

Artificial Intelligence Index Report 2026

arXiv:2606.15708v1 Announce Type: new Abstract: Welcome to the ninth edition of the AI Index report. As AI continues to advance rapidly, the question becomes whether the systems built around it can keep up. Governance frameworks, evaluation methods, education systems, and the data infrastructure needed to track AI's impact are struggling to match the pace of the technology itself. That gap between what AI can do and how prepared we are to manage it runs through every chapter of this year's report. New in this edition, the report tracks how AI is being tested more ambitiously across reasoning, safety, and real-world task execution, and why those measurements are increasingly difficult to rely on. It also features new estimates of generative AI's economic value alongside emerging evidence of its labor market effects, an analytical framework on AI sovereignty, and a science chapter developed in collaboration with Schmidt Sciences. For the first time, the report features standalone chapters on AI in science and AI in medicine, reflecting AI's growing impact across these two domains.

13.
arXiv (CS.AI) 2026-06-18

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

arXiv:2606.01139v3 Announce Type: replace Abstract: Agent skills are procedural artifacts that enable LLM agents to execute workflows, verify constraints, and recover from failures. Existing self-evolving methods refine skills using accumulated trajectories. However, they struggle in cold-start settings, where only an initial, imperfect skill is available. Consequently, skill construction defaults to expert authoring or one-shot LLM generation. Expert-authored skills are costly and may not align with how LLM agents actually execute tasks, while one-shot generated skills can be syntactically well formed yet behaviorally weak. To bridge this gap, we propose SkillRevise, an execution-grounded framework designed to iteratively refine these initial skills. SkillRevise diagnoses skill defects from execution evidence, retrieves relevant repair principles from a general memory, and applies execution-anchored edits. By re-executing candidates, it retains the first verifier-passing skill within the revision budget and falls back to empirical utility only when no candidate succeeds. Evaluated across three benchmarks and five LLMs, SkillRevise substantially outperforms one-shot baselines, improving the base agent's success rate on SkillsBench from 36.05% to 61.63%. Furthermore, the revised skills transfer across both executors and task environments, suggesting that SkillRevise captures reusable procedural knowledge beyond any single executor.

14.
arXiv (CS.LG) 2026-06-18

Seed-Guided Semi-Supervised Clustering by A-Contrario Anomaly Detection

arXiv:2606.18833v1 Announce Type: new Abstract: This paper introduces a semi-supervised clustering framework grounded in the statistical duality between grouping principles and anomaly detection. We address the challenge of robust cluster definition in noisy environments – a task where partitioning algorithms often over-assign outliers and density-based methods remain sensitive to heuristic global parameters. Drawing on a-contrario statistical reasoning and Gestalt proximity principles, we define a cluster as a maximal subset of data points containing no anomalies relative to a null hypothesis of uniform randomness. Central to this approach is the Perception algorithm, which utilises a principled expectation-based threshold ($\mathbb{E} < 1$) to identify outliers without manual parameter tuning. By treating clustering as the dual of anomaly detection, we employ an iterative ``clustering-by-exclusion'' mechanism. The algorithm is seed-guided, leveraging minimal user-provided labels to initialise robust cluster medians and form initial groups, which are subsequently expanded by admitting non-anomalous points. This approach naturally isolates fringe points, isolated noise, and emerging unknown clusters. We evaluate the method on synthetic and real-world benchmarks, including image and text datasets represented through raw, linear-reduced, and neighbourhood-preserving embeddings. Results demonstrate that with as few as 10–30 seeds per cluster, the proposed method achieves competitive and often very strong performance under a practical low-tuning benchmarking protocol, while maintaining linear scalability with respect to both observations and dimensionality for a fixed number of seeded clusters and iterations.

15.
arXiv (quant-ph) 2026-06-15

No classical particle limit for massless quanta

arXiv:2606.14632v1 Announce Type: new Abstract: We investigate whether relativistic massless classical particles may emerge as the classical limit of massless quanta. To address this question independently of any specific dynamics, environment, or pointer basis, we develop an axiomatic and purely kinematical framework for the coarse-graining approach. In this formulation, a candidate classical phase space is taken as the outcome space of a POVM subject only to minimal classicality and covariance under the relevant spacetime symmetry group. Applying this framework to the Poincaré group, we prove a no-go theorem for massless particles: the covariance requirement is incompatible with the operational conditions for classicality. The theorem leaves open field-like limits of massless quanta, for example the emergence of electromagnetic or gravitational fields, while ruling out classical massless particles, such as classical photons or gravitons.

16.
medRxiv (Medicine) 2026-06-18

Evaluating Deep-Learning Based Quantification of Breast Arterial Calcification on Mammography for Cardiovascular Risk Assessment

Purpose: To develop and evaluate a deep learning model for automated quantification of breast arterial calcification (BAC) on screening mammography and to assess whether AI-derived BAC burden predicts major adverse cardiovascular events (MACE) in women. Methods: In this retrospective study, 202,006 women who underwent screening mammography without history of MACE were included. A BAC segmentation model was trained on an expert-annotated dataset using a multi-task U-Net with a ResNet-18 encoder to detect and segment BAC. BAC burden was quantified as area (mm{superscript 2}) from model-generated masks using DICOM pixel spacing and categorized by tertiles into low, intermediate, and high. The PREVENT score and incident MACE were identified from electronic health records. Cox proportional hazards models were developed to evaluate AI-derived BAC burden and PREVENT score alone, and combined models for 5 - and 10-year cardiovascular risk prediction. Results: Among 202,006 women (mean age 54.8{+/-}11.7 years), 23.1% had AI-detected BAC, and 7,701 (3.8%) developed incident MACE during a median follow - up of 7.5 years. On the geographically held-out test set, the BAC model achieved an AUROC of 0.97, Dice score of 0.6678, and Pearson correlation of 0.961 between AI-derived and manually annotated BAC burden. BAC burden increased with age and was higher among women who developed MACE. Five - year MACE incidence increased across BAC categories from 1.5% in women without BAC to 6.9% in those with high BAC burden. BAC burden alone showed modest prediction of MACE, with 5-year and 10-year AUROCs of 0.661 and 0.650, respectively, while PREVENT achieved AUROCs of 0.781 and 0.771. Adding BAC to PREVENT produced minimal improvement in discrimination. Conclusion: Deep learning-based BAC quantification from routine mammography is feasible, accurate, and associated with future cardiovascular risk. Although BAC added little to PREVENT for overall discrimination, it may serve as a scalable opportunistic imaging biomarker to identify women at elevated cardiovascular risk and support preventive care.

17.
arXiv (CS.CV) 2026-06-11

A Comprehensive Ecosystem for Open-Domain Customized Video Generation

Recent progress in video generation has shown impressive visual synthesis capabilities. However, open-domain customized video generation remains limited by the lack of large-scale, annotated datasets capturing diverse identity-specific attributes. To address this, we introduce PexelsCustom-1M, the first publicly available million-scale dataset for identity-preserving video generation, containing one million curated triplets across 8,000+ categories. Leveraging this, we propose CustoMDiT, a parameter-efficient framework that adapts a pretrained multimodal Diffusion Transformer into a customized video generator with only 8% additional learnable parameters. Our method surpasses prior state-of-the-art. However, benchmarks such as DreamBooth cover only 100 classes, which is insufficient for real-world applications. To overcome this, we construct OpenCustom, a new benchmark with 1,000+ categories, created via cross-dataset knowledge fusion from ImageNet and MS-COCO. Extensive experiments confirm the advantages of both our dataset and model. We will open-source the entire ecosystem–including dataset, pipeline, benchmark, and implementations–to support further research.

18.
arXiv (CS.CV) 2026-06-16

SUP-MCRL: Subject-aware Unified Pseudo-feature Coded Multimodal Contrastive Representation Learning for EEG Visual Decoding

Non-invasive brain-computer interfaces suffer severe fidelity degradation in neural visual decoding when generalizing to natural visual experiences. Conventional multimodal contrastive representation learning solely optimizes geometric distance alignment, neglecting semantic consistency and subject selectivity, causing spurious zero-shot alignment. We propose SUP-MCRL, a unified framework integrating three collaborative mechanisms: (1) Semantic-entity Aware Visual Encoder (SAVE), learning spatial attention to extract semantic content without pre-trained saliency models; (2 Unified EEG Enhancer (UEE), employing multi-scale atrous convolutions and inter-band attention for adaptive cross-subject robustness; and (3) Prototype-based Progressive Augmenter (PPA), maintaining an EMA-updated pseudo-feature pool to prevent representation collapse. Zero-shot experiments on THINGS-EEG achieve 66.0%/91.9% (Top-1/Top-5) intra-subject and 24.0%/52.9% LOSO accuracy, surpassing state-of-the-art methods. Code is available at https://github.com/NZWANG/SUP-MCRL.

19.
arXiv (CS.LG) 2026-06-19

When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage

arXiv:2606.20115v1 Announce Type: new Abstract: Conformal risk control (CRC) provides distribution-free guarantees on segmentation quality by calibrating a prediction-set threshold on held-out data. In federated deployments, the standard approach pools calibration scores across sites into a single threshold. We provide the first quantification, on real multi-institutional brain tumor data (FeTS-2022, 1,251 subjects, 20 institutions), showing that this naive pooled CRC protects the average hospital but violates coverage at 40% of individual institutions, with the worst site exceeding the target false-negative rate by 7.8 percentage points. The naive alternative, per-site local CRC, largely restores coverage but inflates prediction sets by 83x, rendering them clinically useless. We propose a shrinkage-based federated CRC protocol: each site transmits only its empirical risk curve (G scalars) to a server, which computes a shrinkage-regularized threshold per site. A single hyperparameter n0 smoothly trades worst-case coverage for prediction-set efficiency; leave-one-site-out sensitivity analysis identifies n0=19, achieving 2.7/20 violations at 2.0x stretch. We further show that direct Lagrangian optimization of coverage budgets fails, concentrating risk on vulnerable hospitals, and that the finite-sample correction term is essential: removing it triples violations. The marginal CRC guarantee is preserved by construction under the stated site-mixture assumption; per-site coverage is validated across four targets with three seeds. No patient-level images, masks, or per-volume scores leave any site.

20.
arXiv (CS.AI) 2026-06-11

Towards Responsibly Non-Compliant Machines

arXiv:2606.12147v1 Announce Type: new Abstract: We consider the problem of engineering autonomous intelligent agents that are capable to responsibly not comply with user requests. We argue that machine non-compliance comes in many different forms, and sketch the issues we should pursue on the road of accomplishing responsibly non-compliant intelligent machines. We anchor responsible non-compliance in justifications for task refusal, pathways to override the non-compliance, as well as careful tracking of security risks and liability transfers.

21.
medRxiv (Medicine) 2026-06-23

Timing of S. aureus-related mortality in a large randomized clinical trial: Implications for future study design

Background: Longer follow-up periods in clinical trials for S. aureus bacteremia (SAB) may capture unrelated deaths, adding random noise that risks biasing trial results towards the null. Objective: To evaluate the timing and infection-relatedness of deaths within a large SAB clinical trial platform. Design: Blinded duplicate adjudication of trial deaths using a modified 7-point Likert-Scale. A third reviewer settled disagreements. Setting: 37 Canadian hospitals participating in the S. aureus Network Adaptive Platform (SNAP) Trial. Participants: 1515 adult patients recruited to SNAP between February 2022 and May 2026. Measurements: Timing and relatedness of 90-day deaths categorized as at least possibly SAB-related not likely to be SAB-related. Optimal follow-up cut-off was determined using Youden's index and graphically. Results: 247 deaths occurred; 97 (39.3%) were adjudicated as at least possibly SAB-related and 150 (60.7%) as not likely related. For probably/definitely related deaths, interrater agreement was 85.0% (Gwet's AC 0.73, substantial); for at least possibly related, it was 77.3% (Gwet's AC 0.55, moderate). Median survival was significantly shorter for SAB-related deaths (12 vs. 30.5 days; difference: 19 days earlier, 95% CI: 12-26, p

22.
arXiv (CS.LG) 2026-06-15

Decompose Sparsely Where You Should, Absorb Densely Where You Should No

arXiv:2606.14040v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) are typically trained to reconstruct the entire residual stream through a sparse dictionary, implicitly assuming that all activation content is amenable to sparse, monosemantic decomposition. We question this assumption and hypothesize that activations contain a low-rank, dense component that is computationally important to the model yet inherently unsuitable for sparse representation, which serves as a major source of the persistent dense latents widely observed in trained SAEs. To test this, we add a small rank-$r$ linear bottleneck in parallel with standard SAEs (BatchTopK and Matryoshka), allowing dense structure to be absorbed before sparse reconstruction. On Gemma-2-2B layer 12, a rank-24 bottleneck reduces dense latent count by up to 84\% while improving sparse probing and targeted probe perturbation on both architectures at matched sparsity. The absorbed component is (i) structurally identifiable as the top principal components and outlier dimensions; (ii) causally necessary, with removing it raising next-token cross-entropy by 7.5$\times$, far exceeding the 2.8$\times$ from removing the geometrically near-identical top-24 PCA directions; and (iii) redundantly encoded by sparse dictionaries, with ablating 787 maximally aligned sparse features raising cross-entropy by only 2.9$\times$ and ablating 2,048 topic-aligned features leaving MMLU topic classification virtually unchanged, whereas removing the scaffold drops it from 98.7\% to chance. Together, our findings identify a compact, semantically informative and causally important component of residual stream activations (which we term a computational scaffold) that standard sparse dictionaries represent inefficiently, suggesting that the scope of sparsity-based interpretability methods warrants careful re-examination.

23.
arXiv (CS.LG) 2026-06-12

Graph Reinforcement Learning for Calibration-Aware Quantum Circuit Routing

arXiv:2606.12816v1 Announce Type: cross Abstract: Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. Fidelity gains come with higher routed two-qubit counts and are concentrated in the 5q and 8q circuit families; under the fixed tree action graph, all 10q families favor SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.

24.
arXiv (CS.LG) 2026-06-17

Beyond IGO-Flow: Toward Convergence Analysis of IGO in Continuous Spaces

arXiv:2606.17523v1 Announce Type: cross Abstract: Information-Geometric Optimization (IGO) provides a unified framework for black-box optimization by interpreting the adaptation of a search distribution as a natural gradient update. Despite its conceptual importance, the convergence theory of IGO remains limited: most existing results concern continuous-time idealizations such as the IGO flow, rather than discrete-time updates with non-infinitesimal learning rates. In this paper, we study discrete-time IGO in continuous spaces, formulated as natural gradient updates in the expectation-parameter coordinates of an exponential family. In particular, we analyze IGO over the multivariate Gaussian family on strongly convex quadratic objective functions. Our analysis covers a setting that simultaneously incorporates full covariance adaptation, a fixed positive learning rate, and quantile-based weights. In this setting, we prove that the covariance matrix converges to the zero matrix. We further show that the mean vector converges to the global optimum, provided that the condition number of the appropriately scaled covariance matrix is bounded at sufficiently frequent iterations. These results advance the convergence theory of IGO and help bridge the gap between the mathematical theory of IGO and practical covariance-adaptive search methods such as CMA-ES.

25.
arXiv (CS.AI) 2026-06-12

Strategic Decision Support for AI Agents

arXiv:2606.12587v1 Announce Type: new Abstract: Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly reversed: AI agents act on behalf of users, while humans and tools becomes support mechanisms around them. This role reversal brings reliability concerns to the forefront, since agentic errors can be consequential and agent behavior must remain aligned with human goals and constraints. Departing from the classical view of decision support, we revisit its two basic principles, the cost–value tradeoff of seeking support and the role of uncertainty quantification, in a setting where AI agents are the central actors. We propose a framework for strategic decision support for AI agents through an optimization problem that minimizes support usage subject to controlling a counterfactual missed-support error: the probability that the agent acts alone on instances where support would have materially improved its output. At the population level, we show that the optimal policy is a threshold rule on the value of support. Building on this structure, we develop an online algorithm that adaptively thresholds such a score and uses randomized exploration to control missed-support error without distributional assumptions. We further introduce a calibration-on-the-fly method that reduces unnecessary support calls online. We instantiate this framework across diverse scenarios, including information gathering, human–AI collaboration, and tool use, showing how each can be modeled through the same strategic decision-support lens. Experiments across these settings show that our method reliably controls the target error while substantially reducing support usage in practice.