Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-17

LineageMark: Multi-user White-box Watermarking for Contribution Tracing in Model Derivation Chains

arXiv:2606.17123v1 Announce Type: cross Abstract: In open large language model (LLM) ecosystems, models are frequently adapted across multiple domains and applications, forming multi-stage derivation chains. Consequently, tracking and verifying historical contributions is essential for model provenance and intellectual property protection. However, existing watermarking methods are mainly designed for single-user, one-time embeddings, often fail under repeated model derivation and incremental updates. To address this problem, we propose LineageMark, a multi-user white-box watermarking framework for model derivation chains. The framework encodes watermarks in model parameters using a projection-based approach. Stable carriers are first selected to reduce sensitivity to model changes, each watermark bit is then represented as a projection statistic over these carriers. Additional watermark insertions introduce only bounded perturbations in the projection space, and margin constraints are used to maintain signal integrity. We evaluate the effectiveness of LineageMark in multi-stage model derivation chains. Experimental results show that LineageMark preserves contributor watermarks across multi-stage derivation and supports incremental multi-user watermark insertion. Furthermore, it exhibits robustness against perturbations such as re-watermarking, fine-tuning, quantization, and pruning.

02.
bioRxiv (Bioinfo) 2026-06-13

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

Authors:

De novo protein binder design has been dominated by structure-based pipelines that require known three-dimensional target conformations and consume substantial compute and generation time per design, limiting their throughput and accessibility for routine large-scale binder exploration. Sequence-only generative models promise a faster and lighter alternative, yet existing systems remain uniformly dense and frequently reintroduce structural computation at inference, undermining the core advantages they were intended to deliver. Across the broader language modelling community, transformers have meanwhile transitioned from fully dense designs to sparse Mixture-of-Experts architectures that decouple capacity from per-token compute, a shift that has yet to reach sequence-only protein binder generation. We present MoE-Bind, an autoregressive protein binder generator that, for the first time in this domain, combines Multi-head Latent Attention with a sparse Mixture-of-Experts feed-forward network and is evaluated under two independent structure predictors, Boltz-2 and AlphaFold2-Multimer. Despite activating less than half the per-token parameters of compute-matched dense baselines, MoE-Bind matches or exceeds them on full-length receptor-conditioned binder generation on a leakage-free Docking Benchmark 5.0 evaluation, transfers without peptide-specific training to short-peptide design, and reduces training and inference compute by a large margin. Routing analysis on generated binders reveals interpretable expert specialization at both the individual amino acid and biochemical group level, a structured expert-token alignment not previously reported for natural-language MoE models. These results show that sparse architectural design, rather than scale, can deliver fast, structure-free, and interpretable protein binder generation.

03.
arXiv (quant-ph) 2026-06-19

Sparse Configuration Interaction for the Electronic Schrödinger Equation Revisited: Complete Basis Set Limit Complexity and Quantum-Encoding Impact

arXiv:2606.20385v1 Announce Type: new Abstract: In this article we revisit regularity results for eigenfunctions in the discrete spectrum of the electronic Schrödinger equation and study their consequences for approximation complexity. In particular, for the convergence to the complete basis set limit, it can be shown that the curse of dimensionality in the leading algebraic exponent can be mitigated. That is, for general sparse grid constructions, the main term of the convergence rate with respect to the number of degrees of freedom is independent of the number of electrons. These insights indicate potential benefits for classical numerical solvers of the electronic Schrödinger equation and also for quantum-computing approaches through new qubit-efficient wavefunction encodings.

04.
arXiv (CS.CL) 2026-06-11

Cross-Layer Discrete Concept Discovery for Interpreting Language Models

Interpreting language models remains challenging due to the existence of residual stream, which linearly mixes and duplicates features across adjacent layers, causing single-layer analyses to miss this cross-layer structure. Cross-layer sparse autoencoders (SAEs) address layer mixing but operate in continuous space, where concepts split across many neurons without clear boundaries. We introduce Cross-Layer Vector Quantized-Variational Autoencoder (CLVQ-VAE), a novel framework which maps representations from a lower layer to a higher layer through a discrete vector-quantization bottleneck, collapsing duplicated residual-stream features into compact, interpretable concept vectors. Our approach combines top-k temperature-based sampling with exponential moving average (EMA) codebook updates, providing controlled exploration of the discrete latent space while maintaining codebook diversity. Across both encoder- and decoder-based models on ERASER-Movie, Jigsaw, and AGNews, CLVQ-VAE outperforms clustering, single-layer vector quantized-variational autoencoder (VQ-VAE), and sparse autoencoder (SAE) baselines across three evaluation axes: removing identified concepts drops model accuracy by up to 93%, LLM judges rank our concepts first in 66.7% of comparisons, and human annotators recover model predictions from our visualizations with 78% accuracy versus 54% for clustering.

05.
arXiv (CS.CV) 2026-06-15

Gaze Heads: How VLMs Look at What They Describe

How a vision-language model internally solves the task of describing an image is far from obvious. We find that the model develops a specific mechanism for this: a small set of attention heads in its language-model backbone, which we call gaze heads, whose attention tracks the image region the model is currently describing. We find them with a simple correlation score from a few forward passes, using comic strips as a controlled testbed where narrative order is laid out spatially. These gaze heads do not just track the image tokens being described: redirecting their attention to a chosen region forces the VLM to describe that region instead. A single attention-mask intervention on the top-100 gaze heads, fewer than 9% of all heads, steers the model's answer to any chosen comic panel at 83.1% accuracy, while the same intervention on random heads fails to redirect the answer, and intervening on all heads destroys generation. The same lever also extends to continuous control: switching the gaze target mid-generation makes the model wrap up its current panel description and move to the new one within a few tokens. Beyond comics, the same intervention redirects answers to chosen regions in natural COCO images. The mechanism further recurs across model sizes from 2B to 32B parameters and across other VLM architectures, although some frozen-encoder families show no comparable head set. More broadly, this shows that targeted edits identified through mechanistic analysis can serve as practical inference-time levers for steering multimodal model behavior, without any retraining. Our code, interactive demo, and datasets are available at https://gaze.baulab.info/

06.
arXiv (CS.AI) 2026-06-19

CRAX: Fast Safe Reinforcement Learning Benchmarking

arXiv:2606.20376v1 Announce Type: cross Abstract: Safety is a core concern for deploying reinforcement learning (RL) agents in real-world domains such as robotics and autonomous driving. While benchmarks have been central to progress in RL, existing safety benchmarks with high-fidelity 3D physics remain computationally slow, limiting large-scale experimentation and rapid prototyping. To address this gap, we propose CRAX (Constrained RL Accelerated with JAX). Built on top of the MuJoCo XLA (MJX) physics engine with realistic 3D dynamics, CRAX leverages vectorized operations and hardware acceleration, yielding up to ~100x speedups over comparable CPU-based safety benchmarks. The benchmark features six environment suites and three agent-specific tasks, each spanning three difficulty levels. Evaluating six popular safe RL methods shows that no single approach dominates across all tasks, and reveals the trade-offs between performance and safety. We find that curriculum learning across difficulty levels and safety transfer can improve performance over direct training in harder settings.

07.
arXiv (CS.LG) 2026-06-11

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

arXiv:2606.11284v1 Announce Type: cross Abstract: Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose $\Phi$-Actor-Critic ($\Phi$-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, $\Phi$-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints. Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that $\Phi$-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.

08.
arXiv (CS.LG) 2026-06-18

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

arXiv:2606.19315v1 Announce Type: new Abstract: Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. While significant progress has been made in using state-of-the-art Auto-Regressive (AR) LLMs for formal theorem proving, these models suffer from inherent limitations. Their next-token prediction generation methods may yield suboptimal performance due to the challenges of long-range coherence and the compounding of errors over long sequences. Recent advancements in diffusion LLMs (dLLMs), which generate text through iterative denoising of a multi-token block, offer a promising alternative. However, the application of dLLMs to formal mathematics, where maintaining long-range coherence is critical, remains largely understudied. To address the challenges above, we propose **Diffusion-Proof**, to the best of our knowledge, the first framework to train and apply dLLMs for formal theorem proving. Our frameworks contain training and inference methods for two models. The first one is *dLLM-Prover-7B*, which performs whole-proof writing with long-range coherent tactic usage. The second one is *dLLM-Corrector-7B*, which is a novel large block diffusion-based correction model. It leverages the in-filling capabilities of dLLMs to perform local proof correction using bi-directional information. Extensive experiments demonstrate that **Diffusion-Proof** relatively significantly outperforms the AR LLM baseline trained under the same dataset. **Diffusion-Proof** achieves an absolute improvement of **1.61%** on ProofNet-Test and **6.14%** on MiniF2F-Test benchmarks compare to the baseline. Notably, **Diffusion-Proof** successfully resolves one IMO problem that more advanced thinking model DeepSeek-Prover-V2-7B could not solve, showcasing the unique advantage of dLLMs in formal theorem proving.

09.
bioRxiv (Bioinfo) 2026-06-14

TopoMIL: Topology Improves Multiple Instance Learning in Diagnostic Microscopic Images

Microscopic images of cells and tissues are central to disease diagnosis. In computational pathology, multiple instance learning (MIL) has emerged as a key paradigm for analyzing numerous images within a single patient sample. While the representative distribution of cells in a sample is important for diagnosis, existing MIL frameworks largely overlook it. We introduce TopoMIL, a framework that extracts the representative topological structure of the sample and integrates it into the MIL classifier. Three topological representations are assessed, each with distinct advantages and computational costs. We evaluate TopoMIL on four histopathology and cytomorphology datasets, each presenting unique challenges. Integrating the sample's topological information into MIL enhances classification across average, max, attention-based, and transformer pooling, yielding AUCROC gains of 3.3%, 4.2%, 5.9%, and 0.5%, respectively, with moderate computational cost. Our work underscores the potential of TopoMIL as a scalable extension to existing morphology-based models in computational pathology.

10.
arXiv (CS.AI) 2026-06-16

RecourseBench: A Modular Framework for Reproducible Algorithmic Recourse Evaluation

arXiv:2606.16113v1 Announce Type: new Abstract: Algorithmic recourse methods provide counterfactual explanations that inform individuals of the actions required to overturn an unfavorable model decision. Despite rapid methodological progress, principled comparison remains elusive; existing frameworks are often difficult to extend and lack both interoperability and systematic verification that integrated methods faithfully reproduce their originally reported results. We introduce RecourseBench, a unified evaluation framework built around three commitments namely, modularity, reproducibility, and interactivity. The framework decomposes the pipeline into five fully decoupled layers – Data, Preprocessing, Model, Recourse Method, and Evaluation – governed by abstract interfaces and a dynamic registry. To address the reproducibility gap in prior benchmarks, we introduce a four-tier classification system in which every integrated method is validated by an automated test suite against its originally reported results. We further provide an interactive web interface for flexible, configuration-driven comparison across methods, datasets, and model architectures. Our framework currently integrates 28 state-of-the-art recourse methods and, to our knowledge, constitutes the first recourse benchmark to explicitly enforce method-level reproducibility through automated, quantitative testing.

11.
arXiv (CS.CV) 2026-06-17

FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement

Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherently ill-posed without holistic geometric constraints. This paper proposes FUSER, the first feed-forward multiview registration transformer that jointly processes all scans in a unified, compact latent space to directly predict global poses without any pairwise estimation. To maintain tractability, FUSER encodes each scan into low-resolution superpoint features via a sparse 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module. Particularly, we transfer 2D attention priors from off-the-shelf foundation models to enhance 3D feature interaction and geometric consistency. Building upon FUSER, we further introduce FUSER-DF, an SE(3)$^N$ diffusion refinement framework to correct FUSER's estimates via denoising in the joint SE(3)$^N$ space. FUSER acts as a surrogate multiview registration model to construct the denoiser, and a prior-conditioned SE(3)$^N$ variational lower bound is derived for denoising supervision. Extensive experiments on 3DMatch, ScanNet and ArkitScenes demonstrate that our approach achieves the superior registration accuracy and outstanding computational efficiency.

12.
arXiv (CS.AI) 2026-06-19

Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference

arXiv:2606.20245v1 Announce Type: new Abstract: Large language models (LLMs) have achieved strong performance across a wide range of language-based tasks by leveraging both extensive parametric knowledge and in-context learning ability, enabling them to incorporate external information provided in the input prompt. However, the integration of external knowledge can introduce conflicts, not only between the model's internal parametric knowledge and the external information, but also among multiple pieces of external contexts. Existing approaches typically assume that either the model or the provided context is reliable, overlooking the possibility that both sources may contain errors, and avoid conflicts by privileging one source over the other, rather than actively resolving inconsistencies. To address these limitations, we propose a novel framework MACR for LLM knowledge conflict resolution that moves beyond the conventional binary choice paradigm and incorporates an explicit conflict-resolution mechanism based on a multi-agent reasoning approach. Specifically, we first propose an adaptive knowledge assessment and retrieval approach that employs a modified semantic entropy measure to quantify an LLM's confidence in its answer to a given query. Based on this confidence estimation, MACR either externalizes the model's internal knowledge as textual representations or retrieves relevant external knowledge when internal knowledge is insufficient, generating basic contexts for subsequent reasoning. Then we introduce an inductive multi-agent reasoning framework with three specialized agents that, respectively, induce explicit rules, analyze potential conflicts, and resolve inconsistencies across all available contexts. Empirical results demonstrate that MACR significantly outperforms state-of-the-art baselines across benchmarks, while also providing interpretable resolutions of explicit conflicts.

13.
arXiv (quant-ph) 2026-06-16

Quantum simulation of the Liouville equation in classical mechanics with discontinuous potential via Schrödingerization

arXiv:2606.15066v1 Announce Type: new Abstract: We develop quantum simulation algorithms for the Liouville equation of classical mechanics with discontinuous potential. Such discontinuities represent potential barriers at which classical particles undergo energy preserving transmission or reflection, and the resulting interface conditions must be incorporated into the numerical flux. We combine Hamiltonian-preserving schemes by Jin and Wen in Commun. Math. Sci. 3(3), 285-315 (2005) with the Schrödingerization method, which embeds the resulting nonunitary semi-discrete dynamics into a unitary Schrödinger type system in one additional auxiliary variable [arXiv:2212.14703, arXiv:2212.13969]. For one-, two-, and $n$-dimensional problems with grid aligned interfaces, we construct sparse matrix representations of the transmission and reflection fluxes using step and hat functions, derive the corresponding Hamiltonians of the Schrödingerized systems, and analyze their sparse-access query complexity. In the sparse-access oracle model, the resulting algorithms have a polynomial dependence on the inverse accuracy and avoid the exponential dependence on the phase-space dimension suffered by classical grid based Hamiltonian-preserving schemes, up to the cost of implementing the oracles and the postselection overhead. We also describe the postselected recovery of the physical solution state and the quantum readout of macroscopic observables such as density and averaged velocity through overlap estimation. Numerical experiments based on classical simulation of the Schrödingerized dynamics validate the proposed formulation and illustrate the correct transmission/reflection behavior at potential barriers.

14.
arXiv (CS.AI) 2026-06-16

Autonomous End-to-End SOH Prediction Services for Battery Systems via Temporal-Contrastive Representation Learning

arXiv:2606.16434v1 Announce Type: cross Abstract: Accurate state of health (SOH) estimation is a critical diagnostic service for lithium-ion battery management. However, reliance on labor-intensive manual feature engineering and opaque black-box models hinders scalable industrial deployment. To address this, we introduce TC-SOH: a modular, plug-and-play service architecture for autonomous, end-to-end SOH prediction. TC-SOH employs a temporal-contrastive mechanism and a cross-window prediction pretext task to extract degradation-relevant representations directly from raw operational data. To improve transparency, we connect model efficacy with representation diagnostics: visualization, sensitivity analysis, redundancy analysis, bidirectional probing, future-SOH probing, and temporal shuffling show that learned features overlap with selected expert descriptors while retaining additional SOH-relevant variation, and that ordered temporal context improves subsequent-SOH prediction. Across four public datasets, TC-SOH outperforms the considered physics-informed and data-driven baselines, reducing MAPE by 1.91 times and RMSE by 2.13 times.

15.
arXiv (CS.AI) 2026-06-11

Characterizing Software Aging in GPU-Based LLM Serving Systems

arXiv:2606.11916v1 Announce Type: cross Abstract: This paper proposes an empirical methodology to study software aging in GPU-based LLM serving systems. Traditional aging studies focus on CPU-centric software with relatively regular workloads; LLM serving is different, spanning a Python host and a CUDA device, handling requests whose cost varies by orders of magnitude, and relying on rapidly evolving software stacks. We run a 216-hour campaign across six co-located deployments under identical stress conditions, monitor host, device, and client metrics in parallel, and apply a statistical pipeline that accounts for autocorrelation and multiple testing. Our results reveal statistically significant memory aging in all deployments, with leak rates strongly dependent on the serving runtime and deployment configuration. Beyond these findings, we provide a reproducible framework that opens a research direction at the intersection of the software aging and rejuvenation and LLM serving communities.

16.
arXiv (quant-ph) 2026-06-19

General circuit mapping algorithm for neutral atom quantum computers

arXiv:2606.20503v1 Announce Type: new Abstract: Neutral atom quantum computers (NAQC) are emerging as a promising, scalable quantum computing platform because of their long qubit coherence, flexible qubit arrangement, and multiqubit gate capabilities. However, circuit execution often requires physically moving qubits, making compilation a critical optimization challenge. We propose a circuit independent mathematical framework built on graph-theoretic combinatorial optimization that determines the minimal number of required qubit transfers. This model captures spatial constraints specific to NAQC platforms with zone-limited gate operations and multi-qubit gates. From this framework, we encode the qubit mapping problem as a nonlinear integer program and solve it using a genetic algorithm, enabling trade-offs between minimizing the total traveled distance and the number of parallel transfer operations. Compared to the state-of-the-art scalable compiler for zoned architectures, our approach consistently finds fewer transfers. Depending on the optimization focus, our method produces shorter traveled distances or fewer parallel transfer operations. This work provides both theoretical guaranties and a practical tool for efficient, architecture-aware quantum circuit compilation. As a result, practitioners can generate hardware-aware mappings that reduce movement-induced errors and better exploit atom transfer parallelism, directly improving execution efficiency on NAQC devices.

17.
arXiv (CS.AI) 2026-06-19

Can In-Context Learning Support Intrinsic Curiosity?

arXiv:2606.19476v1 Announce Type: cross Abstract: Effective machine learning depends not only on how we model data, but also on what data we choose to collect. While large sequence models have revolutionized data modeling, the problem of automated data selection, or "intrinsic curiosity", remains a significant challenge. Classic approaches incentivize exploration by rewarding an agent based on its "learning progress", which measures how much a newly acquired observation improves a world model's predictive ability. However, evaluating these rewards traditionally requires expensive inner loops of gradient descent updates within each trajectory, rendering them computationally impractical at scale. In this work, we investigate whether the emergent in-context learning (ICL) capabilities of sequence models can eliminate this bottleneck by serving as immediate, update-free world models. Specifically, we evaluate whether an exploration policy can be trained to maximize learning progress, using solely the prediction errors and counterfactual context manipulations of an in-context learner. We first prove that in general Markov decision processes, this is in fact impossible in an unbiased way: the resulting intrinsic rewards either suffer from nuisance terms that bias their estimation of true learning progress, or they cannot be implemented using an in-context learner's prediction errors. Conversely, we prove a positive result for a broad subclass of non-temporal settings, encompassing active learning and Bayesian Experimental Design: here, ICL-derived rewards successfully bound and asymptotically converge to the true learning progress. We corroborate our theory with controlled experiments across continuous and symbolic environments, demonstrating that our ICL-driven framework successfully trains curious data-collection policies that explore optimally.

18.
arXiv (quant-ph) 2026-06-15

Spin-orbit coupling by design in quantum state engineering of atomically defined quantum dots

arXiv:2606.14487v1 Announce Type: cross Abstract: Tuning spin-orbit coupling is essential in controlling both spin and charge in confined semiconductor nanostructures, yet it is rarely a truly controllable parameter. Here, we show control over the spin-orbit Hamiltonian in quantum dots and the resulting quantum states by tailoring the confinement potential with atomic-scale precision. Using scanning tunnelling microscopy and spectroscopy, we pattern individual Cs ions into designer quantum dot structures on the surface of indium antimonide, in which electrons from a two-dimensional electron gas are confined with chosen in-plane electric-field gradients. We then quantify the atomic level structure, both spatially resolving the orbital character of the electronic states and their magnetic-field evolution. We demonstrate that the level structure, including the induced zero-field splitting, can be tailored by the designed geometry of the local electric fields. These effects can be described using a Hamiltonian that allows consistent treatment of the confinement-induced spin-orbit coupling beyond the conventional Bychkov-Rashba description. This Hamiltonian is derived from a multiband k.p model and takes the energy dependence of the relevant physical parameters into account. Such precise control of spin-orbit coupling in semiconductor quantum dots is relevant to quantum and spintronic technologies.

19.
medRxiv (Medicine) 2026-06-17

Cost-effectiveness of measles rapid diagnostic tests for replacing or expanding laboratory testing in Ethiopia

Background: In low- and middle-income countries, laboratory testing to rapidly detect measles outbreaks is limited by infrastructure availability and high costs. This study estimates the potential impact and cost-effectiveness of measles rapid diagnostic tests (RDTs) if implemented nationally in Ethiopia to either replace or expand current testing. Methods: An agent-based model to simulate measles outbreaks was calibrated to Ethiopian measles surveillance data. Modelled outbreak outcomes were aggregated over a 10-year period. Scenarios included using RDTs to (1) replace laboratory testing; (2) replace epidemiological linkage; and (3) increase case detection, in addition to replacing laboratory testing and epidemiological linkage. Testing and outbreak response costs (in 2025 US$) were obtained from Ethiopian Public Health Institute from a government perspective. Total costs and disability-adjusted life years (DALYs) for each scenario were compared to baseline. Results: All scenarios were cost saving compared to baseline. Replacing laboratory testing with RDTs saved US$4.2M (3.2M-4.9M) over 10-years, but due to very low testing rates the benefits of eliminating laboratory testing delays were offset by missed cases from the lower RDT sensitivity, leading to similar outbreak detection times and DALYs. Replacing epidemiological linkage with RDTs had similar DALYs but increased the cost savings to US$9.7M. Using RDTs to double case detection reduced outbreak detection time from 113 to 80 days, averted 17,000 DALYs, and saved US$4.3M. Conclusions: In Ethiopia, use of measles RDTs could be cost saving, and if used to expand testing could prevent measles infections through faster outbreak detection and response.

20.
arXiv (CS.LG) 2026-06-19

The Representational Limit of Scalar Interactions: An Interventional Decomposition

arXiv:2606.19410v1 Announce Type: cross Abstract: Signed pairwise interaction scores fundamentally conflate uniqueness (U), redundancy (R), and synergy (S). We prove this on a minimal 3-way XOR structural causal model: faithful indices such as Shapley-Taylor return zero per pair, whereas projective indices such as Shapley Interaction spread the third-order effect into pair scalars that conflate the three mechanisms. We introduce Stochastic Hi-Fi, a post-hoc, retraining-free predictability decomposition that estimates per-feature U/R/S profiles by interventional masked inference. The estimator provides exact interventional semantics, finite-sample Monte Carlo bounds, strict variance reduction from coupled diamond sampling, and uniform finite-vocabulary convergence. Across tabular SCMs, Stochastic Hi-Fi recovers structure missed by scalar baselines (up to 411x larger interaction-magnitude recovery ratios). It also separates redundant and synergistic heads in the GPT-2 IOI circuit. On NIH ChestX-ray14, Stochastic Hi-Fi matches GradCAM on Pointing Game and improves substantially on Deletion AUC.

21.
arXiv (CS.CV) 2026-06-11

DrivingAgent: Design and Scheduling Agents for Autonomous Driving Systems

Many autonomous driving systems are increasingly incorporating foundation models to improve generalization and handle long-tail scenarios. However, this trend introduces two key challenges: (i) the manual and labor-intensive process of designing and integrating new models, and (ii) the lack of intelligent, dynamic scheduling mechanisms to meet strict real-time constraints. While Large Language Model (LLM)-based agents offer a promising avenue for automation, existing frameworks are ill-suited for autonomous driving. Specifically, they fail to distinguish between the fundamentally different requirements of system design and real-time scheduling, treat modules as opaque black boxes, and are not designed for continuous operation. To address these limitations, we propose DrivingAgent, a novel agent framework tailored to the dual challenges of autonomous driving system design and scheduling. In the design phase, DrivingAgent automates module development by interpreting system architecture, generating code, and validating modules via super-network training. In the scheduling phase, it employs a lightweight LLM trained with reinforcement learning to dynamically orchestrate system modules in real time, supported by a structured memory that integrates long-term storage with timestamped short-term context. Experimental results demonstrate that DrivingAgent achieves a superior speed–accuracy trade-off on both the nuScenes and Bench2Drive benchmarks.

22.
arXiv (CS.CV) 2026-06-12

Skeleton Sparsification and Densification Scale-Spaces

The Hamilton-Jacobi skeleton, also known as the medial axis, is a powerful shape descriptor that represents binary objects in terms of the centres of maximal inscribed discs. Despite its broad applicability, the medial axis suffers from sensitivity to noise: Minor boundary variations can lead to disproportionately large and undesirable expansions of the skeleton. Classical pruning methods mitigate this shortcoming by systematically removing extraneous skeletal branches. This sequential simplification of skeletons resembles the principle of sparsification scale-spaces that embed images into a family of reconstructions from increasingly sparse pixel representations. We combine both worlds by introducing skeletonisation scale-spaces: They leverage sparsification of the medial axis to achieve hierarchical simplification of shapes. Unlike conventional pruning, our framework inherently satisfies key scale-space properties such as hierarchical architecture, controllable simplification, and equivariance to geometric transformations. We provide a rigorous theoretical foundation in both continuous and discrete formulations and extend the concept further with densification. By growing the skeleton successively instead of shrinking it, we allow inverse progression from coarse to fine scales. Densification scale-spaces can even reach beyond the original skeleton to produce overcomplete shape representations with relevancy for practical applications. Through proof-of-concept experiments, we demonstrate the effectiveness of our framework for practical tasks including robust skeletonisation, shape compression, and stiffness enhancement for additive manufacturing.

23.
arXiv (CS.AI) 2026-06-16

Relational Structural Causal Models

arXiv:2606.14892v1 Announce Type: new Abstract: An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification–including in the presence of unobserved confounding–we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians.

24.
arXiv (CS.CV) 2026-06-16

Active Reference Acquisition in Few-Shot Font Generation

Few-shot font generation aims to synthesize the remaining glyphs of a font given one or a few reference glyphs while preserving stylistic consistency, thereby supporting font designers in efficiently completing a typeface. Existing methods primarily focus on improving generation quality given a fixed reference set. However, when the current reference glyphs are insufficient to represent the target style, few-shot font generation may fail to produce satisfactory results. In practical scenarios, additional reference glyphs can often be obtained from the designer when necessary. Accordingly, we propose a new framework, Active Reference Acquisition in Few-Shot Font Generation, in which the model sequentially decides which character to acquire next as an additional reference. Furthermore, we propose a reference part-coverage-based acquisition function to efficiently query the designer. Motivated by the observation that font styles are well characterized by local structural parts, we represent each glyph using a histogram of local features and select query characters that maximize the expected part coverage of the reference set. By prioritizing characters that contain parts not yet covered by the current references, the proposed method progressively expands the diversity of visual parts in the reference set. As a result, generation quality is improved with fewer queries. Experiments on the Google Fonts dataset demonstrate that the proposed method achieves higher generation quality than random querying and reference-agnostic baselines. The code is available at https://github.com/matsuo-shinnosuke/ActiveRef-FontGen.

25.
arXiv (CS.AI) 2026-06-12

Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation

arXiv:2606.13039v1 Announce Type: cross Abstract: The UK government has adopted a pro-AI stance to help transform public service delivery in the face of severe financial pressures, but the path to translate this vision into responsible AI practice remains ill-defined. While UK policy is often set at the national level, local authorities are responsible for most public service delivery, and the rapid advance of AI-first narratives in the public sector is exposing fault lines in knowledge and practice at this national-local interface. This paper examines how responsible AI is interpreted and implemented at the interface between the UK's central government and local authorities, taking the high-stakes area of Special Educational Needs and Disabilities (SEND) as a case study. We present a thematic analysis of 17 semi-structured interviews with policymakers, practitioners, and third-sector professionals to identify barriers and enabling conditions for responsible AI where national policy meets local practice. We identify five interconnected challenges facing local authorities: shadow usage of AI and data privacy risks, market-government asymmetry in AI provision, insufficient workforce readiness, a lack of standardised definitions and measurements, and gaps in human accountability. For each, participants proposed actionable steps, from strengthening data protection frameworks and rebalancing the market-government relationship to enhancing workforce capacity. Our examination of SEND brings these challenges into sharper focus, showing how high-stakes decisions affecting vulnerable children and families intensify tensions around accountability, fairness, and human oversight, exposing the limits of a principle-based regulatory approach. We argue that responsible public sector AI requires both national policy adjustments and structural reforms to institutional capacity, values, and governance mechanisms at the local level.