Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-15

When Sample Selection Bias Precipitates Model Collapse

arXiv:2606.13732v1 Announce Type: new Abstract: The proliferation of recursive training on synthetic data can alleviate data scarcity but risks model collapse, where repeated training erodes distributional tails and homogenizes outputs. Data selection is widely viewed as a remedy, yet its reliability depends critically on the reference distribution used by the verifier. We show that in low-resource verification regimes, where each verifier observes only a small, fragmented, and biased slice of the target manifold, selection itself becomes biased. This situation naturally arises in low-resource data silos such as healthcare consortia or proprietary financial institutions, where raw data cannot be pooled and local references are inherently incomplete. As a result, selection preferentially retains samples aligned with the local manifold while pruning globally relevant tail modes, turning from a safeguard against collapse into a mechanism that precipitates it. We theoretically prove that such siloed selection accelerates collapse and induces power-law diversity decay. As an initial mitigation, we construct Wasserstein proxy references from multiple silos without sharing raw data. Empirical results confirm that local-reference selection fails on skewed distributions, whereas collaborative proxy references mitigate diversity degradation, suggesting that recursive synthetic-data pipelines require particular caution when real-data coverage is fragmented or scarce.

02.
arXiv (CS.AI) 2026-06-24

BluTrain: A C++/CUDA Framework for AI Systems

arXiv:2606.24780v1 Announce Type: new Abstract: Progress in deep learning is, at scale, more a matter of systems engineering than of modelling: the behaviour of a model in training (its throughput, its memory footprint, and the numerical fidelity of the result) is determined less by the architecture itself than by how that architecture is expressed on the hardware. To achieve absolute control over this hardware expression while abstracting away systems complexity to make modelling seamless and eliminating the need for repetitive orchestration logic, BluTrain was architected from first principles as a robust, lightweight, and architecture-general training framework in standard C++ and the core CUDA programming model. Every layer is implemented natively: a typed tensor module with reverse-mode autograd, a linear-algebra library, a caching allocator, a multi-mode distributed-execution module, and an MLIR-based deep-learning compiler. In formal evaluations training a 124M-parameter GPT-2 baseline in FP32 on an 8-GPU 6000 Ada system, BluTrain outperforms industry-standard baselines in both throughput (sustaining an average of 407K tokens/s versus PyTorch's 395K tokens/s) and memory efficiency (achieving up to a 22% footprint reduction), while strictly preserving numerical fidelity and converging to a marginally lower final validation loss. With every layer explicitly open to native tuning, the performance ceiling is the framework's own to raise.

03.
arXiv (quant-ph) 2026-06-16

Achieving double-logarithmic precision dependence in optimization-based quantum unstructured search

arXiv:2603.26039v3 Announce Type: replace Abstract: Grover's algorithm is a fundamental quantum algorithm that achieves a quadratic speedup for unstructured search problems of size $N$. Recent studies have reformulated this task as a maximization problem on the unitary manifold and solved it via linearly convergent Riemannian gradient ascent (RGA) methods, resulting in a complexity of $O(\sqrt{N/M}\log (1/\varepsilon))$, where $M$ denotes the number of target items and $\varepsilon$ denotes the success probability error. In this work, we adopt the Riemannian modified Newton (RMN) method to solve the quantum search problem, under the assumption that the ratio $ M/N$ is known. We show that, in this setting, the Riemannian Newton direction is collinear with the Riemannian gradient in the sense that the Riemannian gradient is always an eigenvector of the corresponding Riemannian Hessian. This structure removes the overhead of Hessian inversion and allows the proposed RMN method to retain the local quadratic convergence in terms of the error $\varepsilon$. More precisely, we rigorously prove an overall complexity of $O(\sqrt{N/M}+\log\log(1/\varepsilon))$. Furthermore, our approach remains Grover-compatible, namely, it relies exclusively on the standard Grover diffusion and oracle operators to ensure algorithmic implementability, and its parameter update process can be efficiently precomputed on classical computers.

04.
medRxiv (Medicine) 2026-06-22

Associations of Chemical Exposures with Psychological Distress and Depression Diagnosis among Waste Pickers in Brasilia, Brazil: A Cross-Sectional Study

Introduction: Waste pickers face chemical exposures. We evaluated whether chemical exposure is associated with psychological distress and depression. Methods: A 2017 cross-sectional survey included 1,141 waste pickers working in the Estrutural open dump in Brasilia, Brazil. Participants self-reported occupational exposure to 11 chemical categories, 17 psychological distress symptoms, and depression diagnoses. Associations of chemical exposure with mean psychological distress scores and depression prevalence were assessed, adjusted for age, sex, marital status, and income. Results: Mean psychological distress score was higher among those exposed to any chemical (mean of 8.1 vs 6.1; adjusted mean difference [aMD]: 1.8 [0.9, 2.7]) and higher among those exposed to each of 11 chemical categories, for example, smoke (aMD: 1.2 [0.6, 1.7]), batteries (aMD: 1.5 [1.0, 1.9], and oils (aMD: 1.3 [0.9, 1.8]). Depression was more prevalent among those exposed to oils (16.6% vs 10.6%; adjusted prevalence difference [aPD]: 6.3% [95% CI: 2.3, 10.2]), cleaning products (aPD: 5.4% [1.2, 9.5]), medications (aPD: 4.7% [0.6, 8.8]), and aerosols (aPD: 5.3% [1.3, 9.3]) but, not smoke, batteries, greases, insecticides, solvents, paints, chemical containers, or any chemical. Conclusion: These associations highlight the need to consider policy level protections for waste pickers to reduce chemical exposure and guard against psychological distress. Further research is necessary to explore which specific chemicals, within broad chemical categories, are associated with psychological distress and depression.

05.
arXiv (quant-ph) 2026-06-11

Rolling Stock Planning Using the Quantum Approximate Optimization Algorithm

arXiv:2606.11383v1 Announce Type: new Abstract: Rolling stock planning is a complex optimization problem in railway management that involves assigning physical trains to scheduled trips while minimizing operational costs. In this work, we address a specific instance of this problem featuring 190 trips over two days, subject to constraints such as mandatory maintenance stops. We reformulate the problem as a Maximum-Weight Independent Set (MWIS) problem on a graph where nodes represent feasible train cycles. To handle the computational complexity of the large search space, we propose a hybrid divide-and-conquer algorithm. This approach iteratively selects subgraphs and solves the MWIS problem using various solvers, including exact classical methods and the Quantum Approximate Optimization Algorithm (QAOA). We evaluate the algorithm's performance by comparing these methods and analyzing the scaling with respect to subgraph size, with QAOA assessed through both classical simulation and execution on a quantum device (IQM Emerald). Our results indicate that increasing the subgraph size generally improves solution quality, demonstrating that the hybrid framework can effectively bridge the gap between polynomial-time approximate solvers and exponential-time exact methods.

06.
arXiv (CS.CL) 2026-06-18

Compact Geometric Representations of Hierarchies

Computing geometric representations of data is a cornerstone of modern machine learning, typically achieved by training dual encoders which map queries and documents into a shared embedding space. Recent work of You et al. [NeurIPS '25] has extended this approach to hierarchical retrieval, where relevance is determined by the ancestor-descendant relationships in a Directed Acyclic Graph (DAG). While previous work has shown that valid embeddings exist when the number of descendants is small, these bounds degrade significantly for deep hierarchies, requiring dimensions as large as the total number of nodes. In this paper, we investigate compact reachability embeddings for more general graph classes and provide theoretical guarantees for representing hierarchies using embeddings whose dimension depends on structural graph parameters. We prove that for any directed tree, there exists a reachability embedding in constant dimension 3, independent of the tree's size or depth. We generalize this result to graphs characterized by treewidth $t$, constructing embeddings of dimension $O(t \log n)$, where $n$ is the number of nodes. Complementing these upper bounds, we provide matching or near-matching lower bounds, showing that dimension $\Omega(n)$ is necessary for general DAGs and $\Omega(t/\log(n/t))$ is required for graphs of treewidth $t$. We also obtain upper and lower bounds parameterized by the number of cross-edges in the DAG. We additionally show that our embeddings can be constructed on real world datasets, and that they give much smaller dimensions in high recall regimes compared to prior embeddings with theoretical guarantees.

07.
Nature (Science) 2026-06-17

Lethal plague outbreaks in Lake Baikal hunter-gatherers 5,500 years ago

Plague is among the most devastating diseases in human history1. However, early strains of the plague-causing bacterium Yersinia pestis lacked virulence factors that are required for the bubonic form until around 3,800 years ago2,3. Consequently, the morbidity and mortality of early plague strains remain unclear. Here we describe early plague strains that are associated with two phases of outbreaks among mid-Holocene hunter-gatherers near Lake Baikal in southeast Siberia, beginning from about 5,500 years ago. These outbreaks occur across four hunter-gatherer cemeteries, with a 39% detection rate for plague infection. By reconstructing kinship pedigrees, we show that small familial groups were affected, consistent with human-to-human spread of disease, and that the first outbreak occurred within a single generation. The infections appear to have resulted in acute mortality, especially among children (aged 8 to 11 years). We further note functional differences, including in the ypm superantigen locus, which is also present in present day Yersinia pseudotuberculosis. The new strains diverge ancestrally to known Y. pestis and constrain the timing of its emergence, indicating that this happened before approximately 5,700 years ago. These findings show that plague outbreaks happened earlier than previously thought and were indeed lethal. We contend that the occurrence of outbreaks among mid-Holocene hunter-gatherer communities well outside the sphere of Late Neolithic Europe challenges the notion that higher population densities and lifestyle changes during the Neolithic agricultural transition were prerequisites for plague epidemics. Analyses of ancient DNA from hunter-gatherers near Lake Baikal in southeast Siberia around 5,500 years ago indicate that highly virulent Yersinia pestis emerged earlier than previously estimated, far from the next known cases of infection in Late Neolithic Europe.

08.
arXiv (CS.AI) 2026-06-25

BrainAgent: A Large Language Model-Driven Multi-Agent Framework for Autonomous Brain Signal Understanding

arXiv:2606.25400v1 Announce Type: new Abstract: Brain-Computer Interfaces (BCIs) and brain signal understanding are pivotal for clinical health and next-generation interactions. Despite this significance, its widespread adoption in real-world scenarios remains restricted, primarily because current analytical paradigms lack sufficient agentic intelligence. First, existing methodologies impose prohibitive technical barriers, requiring extensive specialized expertise. Second, they remain inherently static and task-specific, failing to execute the complex, long-horizon workflows essential for real-world deployment. To accelerate the democratization of brain signal understanding, we draw inspiration from Large Language Models (LLMs) to introduce BrainAgent, an LLM-driven multi-agent framework designed to ground abstract natural language intent into rigorous, executable, and end-to-end processing pipelines. BrainAgent employs a hierarchical architecture where a central supervisor orchestrates specialized sub-agents for adaptive task decomposition and execution. Furthermore, we establish a comprehensive, systematic benchmark for evaluating agentic systems in brain signal analysis. Empirical results demonstrate that BrainAgent effectively automates complex workflows with superior reliability, marking a paradigm shift toward democratized brain signal understanding.

09.
arXiv (CS.LG) 2026-06-16

A Penalty Approach for Differentiation Through Black-Box Quadratic Programming Solvers

arXiv:2602.14154v3 Announce Type: replace Abstract: Differentiating through the solution of a quadratic program (QP) is a central problem in differentiable optimization. Most existing approaches differentiate through the Karush–Kuhn–Tucker (KKT) system, but their computational cost and numerical robustness can degrade at scale. To address these limitations, we propose dXPP, a penalty-based differentiation framework that decouples QP solving from differentiation. In the solving step (forward pass), dXPP is solver-agnostic and can leverage any black-box QP solver. In the differentiation step (backward pass), we map the solution to a smooth approximate penalty problem and implicitly differentiate through it, requiring only the solution of a much smaller linear system in the primal variables. This approach bypasses the difficulties inherent in explicit KKT differentiation and significantly improves computational efficiency and robustness. We evaluate dXPP on various tasks, including randomly generated QPs, large-scale sparse projection problems, and a real-world multi-period portfolio optimization task. Empirical results demonstrate that dXPP is competitive with KKT-based differentiation methods and achieves substantial speedups on large-scale problems. Our implementation is open source and available at https://github.com/mmmmmmlinghu/dXPP.

10.
arXiv (CS.CL) 2026-06-16

Generative causal testing to bridge data-driven models and scientific theories in language neuroscience

Representations from large language models are highly effective at predicting BOLD fMRI responses to language stimuli. However, these representations are largely opaque: it is unclear what features of the language stimulus drive the response in each brain area. We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain from predictive models and then testing those explanations in follow-up experiments using LLM-generated stimuli.This approach is successful at explaining selectivity both in individual voxels and cortical regions of interest (ROIs), including newly identified microROIs in prefrontal cortex. We show that explanatory accuracy is closely related to the predictive power and stability of the underlying predictive models. Finally, we show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity. These results demonstrate that LLMs can be used to bridge the widening gap between data-driven models and formal scientific theories.

11.
arXiv (CS.CV) 2026-06-25

REViT: Roto-reflection Equivariant Convolutional Vision Transformer

In this paper, we propose a discrete roto-reflection group equivariant vision transformer with convolutional attention. Roto-reflection equivariant networks preserve the rotational, flip and positional symmetry in feature maps, making them useful for tasks where orientation of the inputs is relevant to the model outputs. In image classification and object detection, most of the studies on roto-reflection equivariant models have focused on using convolutional neural networks rather than vision transformers. In this paper, we examine the challenges involved in achieving equivariance in vision transformers, and we propose a simpler way to implement a discretized roto-reflection group equivariant vision transformer. The experimental results demonstrate that our approach outperforms the existing approaches for developing discrete roto-reflection group equivariant neural networks for image classification.

12.
arXiv (CS.AI) 2026-06-25

EmotionAI: A Privacy-Preserving Computational Intelligence Pipeline for Speech-Emotion-Grounded Conversational Analysis

arXiv:2606.24941v1 Announce Type: cross Abstract: Reviewing recorded interviews for affective cues such as composure, hesitation and agitation is slow and subjective, and cloud services that could automate it require sensitive audio to leave the device. EmotionAI is a fully local Computational Intelligence (CI) pipeline that couples Speech Emotion Recognition (SER) with generative reasoning. Speaker diarisation, Whisper Automatic Speech Recognition (ASR) and a wav2vec2 emotion classifier produce per-segment affective evidence, which is then passed to an adversarial three-model local Large Language Model (LLM) panel for timestamp-grounded and citation-constrained question answering. Zero-shot evaluation on the RAVDESS four-class English subset (n = 672) exposes cross-corpus fragility rather than classifier superiority: the deployed classifier scores 48.8% accuracy, above random (24.9%) and majority (28.6%) baselines but below an in-domain MFCC + logistic-regression comparator (71.0%). The complete pipeline runs in a mean 157 s on CPU (real-time factor approximately 1.33) with zero external calls. The contribution is not state-of-the-art SER but an auditable, privacy-preserving integration of imperfect affective evidence into grounded conversational analysis, together with an honest empirical account of where cross-corpus transfer and human-centred validation still fall short.

13.
arXiv (CS.AI) 2026-06-16

AL-GNN: Privacy-Preserving and Replay-Free Continual Graph Learning via Analytic Learning

arXiv:2512.18295v2 Announce Type: replace-cross Abstract: Continual graph learning (CGL) aims to enable graph neural networks to incrementally learn from a stream of graph structured data without forgetting previously acquired knowledge. Existing methods particularly those based on experience replay typically store and revisit past graph data to mitigate catastrophic forgetting. However, these approaches pose significant limitations, including privacy concerns, inefficiency. In this work, we propose AL GNN, a novel framework for continual graph learning that eliminates the need for backpropagation and replay buffers. Instead, AL GNN leverages principles from analytic learning theory to formulate learning as a recursive least squares optimization process. It maintains and updates model knowledge analytically through closed form classifier updates and a regularized feature autocorrelation matrix. This design enables efficient one pass training for each task, and inherently preserves data privacy by avoiding historical sample storage. Extensive experiments on multiple dynamic graph classification benchmarks demonstrate that AL GNN achieves competitive or superior performance compared to existing methods. For instance, it improves average performance by 10% on CoraFull and reduces forgetting by over 30% on Reddit, while also reducing training time by nearly 50% due to its backpropagation free design.

14.
arXiv (CS.AI) 2026-06-16

Intelligence Is Not the Bottleneck: Validating an LLM First-Pass Manuscript Score Against Peer-Review Outcomes

arXiv:2606.15887v1 Announce Type: cross Abstract: Large language model (LLM) systems are increasingly proposed to assist peer review, yet most evaluations judge the prose of machine-generated review text, not the validity of the numeric score a system assigns. We validate AIPR, which reads a submitted manuscript and emits five 0-100 quality dimensions and a weighted overall score, against the public decision outcomes of a major machine learning venue. AIPR grades by prompting alone, with no fine-tuning on reviews or decisions. Across 300 ICLR submissions with public decision tiers and reviewer ratings, graded under a frozen pipeline with hypotheses pre-registered before any score met any outcome, the overall score separates rejected from accepted submissions (AUROC 0.82, 95% CI 0.78-0.87), rises monotonically across tiers, and tracks the mean reviewer rating. The signal is strongest where we claim it: the lowest-scoring fifth is rejected far above the base rate, with oral papers absent. The validity comes mostly from the model: a one-paragraph prompt on the same model discriminates almost as well as the full pipeline (the small gap favours the pipeline but does not meet the pre-declared criterion, p = 0.09). What the engineering adds is reliability and a grounded review: AIPR's score barely moves across repeated runs (0.7 vs. 2.8 points within-paper SD) where the bare prompt swings, and the same pass returns a rubric-structured, evidence-grounded review rather than a bare number, with the human keeping the decision.

15.
arXiv (CS.LG) 2026-06-24

Stochastic Dimension Implicit Functional Projections for Global Integral Conservation in High-Dimensional PINNs

arXiv:2603.29237v3 Announce Type: replace Abstract: Enforcing prescribed global integral constraints in mesh-free neural PDE solvers is challenging in high-dimensional domains. Existing projection methods for spatial integrals are often tied to fixed grids or uniform quadrature, which can conflict with randomly sampled physics-informed neural networks (PINNs) and scale poorly with dimension. High-order differential operators also increase reverse-mode automatic differentiation memory costs. We propose Stochastic Dimension Implicit Functional Projection (SDIFP), a quadrature-level framework for enforcing prescribed first and second spatial moments. SDIFP replaces tensor-product nodal projection by a global affine correction of the neural-network output, with two scalar coefficients determined from a weighted quadrature rule. Under positive target variance and nonzero empirical raw variance, this correction is the nearest-point projection, in the weighted quadrature norm, onto the empirical two-moment constraint set. Thus, the prescribed moments are exact for the selected quadrature rule, while continuum errors are quadrature errors of the corrected field. For decomposable high-dimensional linear operators, SDIFP combines affine moment correction with stochastic operator-subset sampling. With independent residual and derivative sampling and conditionally unbiased coefficient-gradient estimation, the resulting estimator is unbiased for the specified quadrature-based residual objective; the shared-subset fast mode is biased in general. SDIFP avoids tensor-product quadrature for moment enforcement, separates forward quadrature evaluation from the reverse-mode graph, and retains pointwise inference efficiency once the affine coefficients are fixed or precomputed.

16.
arXiv (CS.LG) 2026-06-18

Bridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies

arXiv:2606.18567v1 Announce Type: cross Abstract: This paper presents a methodology-centered transfer learning framework for fragility adaptation under domain shift, class imbalance, and scarce target labels while preserving engineering interpretability and supporting decision-making under uncertainty. Four transfer learning strategies (instance-based, parameter-based, hierarchical Bayesian, and multi-source) are demonstrated through three complementary case studies: (i) instance-based transfer learning via importance weighting, demonstrated on coastal bridge fragility using Hurricane Katrina observations; (ii) parameter-based transfer learning together with hierarchical Bayesian transfer learning, enabling partial pooling across strata and posterior uncertainty quantification, demonstrated on residential building fragility using Hurricane Ian observations; and (iii) multi-source transfer learning that fuses multiple analytical fragility models with learned source weights and regularized target-domain adaptation, demonstrated on seismic bridge fragility using observations from the 2001 Nisqually earthquake. Across these case studies, direct transfer of source models (i.e. using existing state-of-the-art models) fails under domain shift and severe class imbalance, while targeted adaptation substantially improves failure detection and predictive stability in low-data regimes. These findings highlight the need for systematic guidance on diagnostics, strategy selection, and uncertainty reporting when developing and adapting fragility models.

17.
arXiv (quant-ph) 2026-06-19

Local controllability of heralded quantum linear optics

arXiv:2606.19470v1 Announce Type: new Abstract: Photonic linear optical networks provide a versatile platform for quantum information processing and quantum state engineering. However, the set of states that can be generated using passive linear optics alone is fundamentally constrained by bosonic symmetries. Heralding, based on conditional measurements on auxiliary modes, is a widely used technique to overcome these limitations and effectively enlarge the set of accessible states. Despite the widespread use of heralding, it is often unclear how specific ancillary resources impact the overall reachability of the target space. In this work, we investigate the local controllability of photonic states in linear optical networks by analyzing the rank of the Jacobian of the output state with respect to the underlying unitary circuit, which provides a quantitative measure of the dimension of the accessible tangent space at a given configuration. Our analysis ranges from passive linear optics to heralded linear optics, where auxiliary resources and conditional measurements are included. Within this framework, we quantify how different resources enlarge the locally accessible state space beyond that of passive linear optics and determine the resources required for the Jacobian rank to reach its maximal value, thereby achieving full local controllability. As maximal local rank is a necessary condition for global reachability, our framework offers a systematic tool to assess and compare the accessible state space of measurement-based photonic architectures, and to establish practical criteria for the resources needed in high-dimensional quantum state engineering.

18.
arXiv (CS.CV) 2026-06-16

Learning Fine-Grained Correspondence with Cross-Perspective Perception for Open-Vocabulary 6D Object Pose Estimation

Open-vocabulary 6D object pose estimation empowers robots to manipulate arbitrary unseen objects guided solely by natural language. However, a critical limitation of existing approaches is their reliance on unconstrained global matching strategies. In open-world scenarios, trying to match anchor features against the entire query image space introduces excessive ambiguity, as target features are easily confused with background distractors. To resolve this, we propose Fine-grained Correspondence Pose Estimation (FiCoP), a framework that transitions from noise-prone global matching to spatially-constrained patch-level correspondence. To systematically eliminate background interference, FiCoP first employs an object-centric disentanglement step to isolate the target from macro-level environmental noise. Building upon this localized region, our core methodological innovations are twofold. Firstly, a Cross-Perspective Global Perception (CPGP) module is proposed to fuse dual-view features, establishing structural consensus through explicit context reasoning and text-guided semantic injection. Secondly, we design a Patch Correlation Predictor (PCP) that leverages a patch-to-patch correlation matrix as a structural prior. This generates a precise block-wise association map, acting as a spatial filter to enforce fine-grained, noise-resilient matching. Experiments on the REAL275 and Toyota-Light datasets demonstrate that FiCoP improves Average Recall by 8.0% and 6.1%, respectively, compared to the state-of-the-art method, highlighting its capability to deliver robust and generalized perception for robotic agents operating in complex, unconstrained open-world environments. The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP.

19.
arXiv (CS.AI) 2026-06-24

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

arXiv:2606.24026v1 Announce Type: new Abstract: Mechanistic interpretability has made substantial progress in automatically localizing circuits, but explaining what localized components do remains labor-intensive and difficult to standardize. In this work, we study whether language model (LM) agents can assist with this explanation problem once a circuit has already been identified. We introduce AgenticInterpBench, a benchmark for circuit explanation built from 84 semi-synthetic transformer circuits with 163 component-level annotations. We propose HyVE (Hypothesize, Validate, Explain), an agentic explainer that analyzes each component through an iterative loop of observation, hypothesis generation, and causal validation, eventually producing a component-level explanation and a circuit-level task description. Across four LM backbones, HyVE recovers useful component- and task-level explanations, but no backbone is uniformly best. Our analysis shows that strong backbones usually form observation-grounded hypotheses, while failures more often arise later in the validation loop, through incomplete validation plans, code execution errors, or unresolved hypotheses. A case study on an arithmetic circuit in Llama-3-8B shows that the same formulation can extend beyond semi-synthetic benchmarks to naturally trained models. Overall, LM agents are promising circuit explainers, but reliable validation remains the key obstacle.

20.
arXiv (CS.LG) 2026-06-19

Weibull Weight-Scale Parameter Evolution under AdamW Training Dynamics

作者:

arXiv:2606.19367v1 Announce Type: new Abstract: Building on a two-parameter Weibull framework for diagnosing transformer weight distributions, we study why the Weibull weight-scale parameter $\lambda$ grows, overshoots, and then relaxes during AdamW training. We derive a leading-order three-force decomposition of the squared weight norm from the AdamW update: an alignment force measuring the correlation between weights and the adaptive update direction, an injection force from adaptive step magnitude, and a decay force from decoupled weight decay. On self-trained Pythia-70M models with ground-truth optimizer moments, alignment dominates the rise phase, contributing 88-94% of the absolute force budget across four random seeds and remaining robust to super-weight removal. Near saturation, alignment and decay approach balance, explaining the transition from weight-scale growth to relaxation. These force dynamics directly govern the squared-norm component underlying $\lambda(t)$; the remaining RMS-to-Weibull reconstruction offset is measurable and decomposes into bridge and integration components, totaling approximately 5-6% in densely sampled regions. To extend the analysis to real models where optimizer moments are unavailable, we introduce a spline displacement method that recovers the alignment force from sparse checkpoints with approximately 92-94% accuracy, about twice the naive two-point baseline. We further observe that the peak value of $\lambda(t)$ varies with training-data coherence in our experiments, suggesting a data-dependent component of weight-scale growth that we leave to a controlled follow-up study. Code and data are available at https://github.com/tiexinding/NPM-Weibull-public.

21.
arXiv (CS.CV) 2026-06-16

RLPR: Radar-to-LiDAR Place Recognition via Two-Stage Asymmetric Cross-Modal Alignment for Autonomous Driving

All-weather autonomy is critical for autonomous driving, which necessitates reliable localization across diverse scenarios. While LiDAR place recognition is widely deployed for this task, its performance degrades in adverse weather. Conversely, radar-based methods, though weather-resilient, are hindered by the general unavailability of radar maps. To bridge this gap, radar-to-LiDAR place recognition, which localizes radar scans within existing LiDAR maps, has garnered increasing interest. However, extracting discriminative and generalizable features shared between modalities remains challenging, compounded by the scarcity of large-scale paired training data and the signal heterogeneity across radar types. In this work, we propose RLPR, a robust radar-to-LiDAR place recognition framework compatible with single-chip, scanning, and 4D radars. We first design a dual-stream network to extract structural features that abstract away from sensor-specific signal properties (e.g., Doppler or RCS). Subsequently, motivated by our task-specific asymmetry observation between radar and LiDAR, we introduce a two-stage asymmetric cross-modal alignment (TACMA) strategy, which leverages the pre-trained radar branch as a discriminative anchor to guide the alignment process. Experiments on four datasets demonstrate that RLPR achieves state-of-the-art recognition accuracy with strong zero-shot generalization capabilities.

22.
arXiv (CS.AI) 2026-06-12

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

arXiv:2606.12737v1 Announce Type: cross Abstract: Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses mainly focus on blocking malicious content at inference time, and current red-teaming methods primarily optimize attack success. As a result, developers have limited visibility into how latent prompt injections emerge and propagate through agents. We propose PI-Hunter, an automated agentic auditing framework for proactive vulnerability exposure in LLM agents. PI-Hunter constructs realistic source-aware test cases and iteratively evolves them through feedback-driven exploration to induce agents to retrieve and reveal latent malicious instructions embedded within external environments. Extensive experiments across multiple benchmarks, agent architectures, attacks, and defenses demonstrate that PI-Hunter substantially improves vulnerability exposure and attack-surface coverage over strong automated red-teaming baselines, while remaining effective under existing prompt injection defenses.

23.
arXiv (CS.AI) 2026-06-16

Posterior Twins: Distributional Behavioral Simulation for Enterprise Decisions

作者:

arXiv:2606.16415v1 Announce Type: new Abstract: Enterprise behavioral simulation requires more than producing a plausible response. Many decisions depend on the shape of a population under a proposed action: which segments accept, defect, hesitate, or move into risk-sensitive states. This paper introduces Posterior Twins, a memory-grounded digital-twin approach that represents likely behavior as an updated distribution under a specific decision context. We evaluate a family of Twinning Labs behavioral-model operating points on a 226-example held-out behavioral-response benchmark and report both modal accuracy and Wasserstein-1 distance. The results show that modal accuracy and distributional fidelity identify different operating regimes. TL-Twin Alpha achieves the lowest observed Wasserstein-1 distance in the reported result set ($W_1 = 1.16$), while TL-Twin Delta and TL-Twin Gamma provide balanced operating points near the modal-accuracy frontier. The paper frames these results as a systems result: governed memory, behavioral model routing, scenario orchestration, distributional aggregation, and auditability are necessary for turning simulated behavior into reusable enterprise decision evidence.

24.
arXiv (CS.AI) 2026-06-17

Moving Out: Physically-grounded Human-AI Collaboration

arXiv:2507.18623v4 Announce Type: replace-cross Abstract: The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. However, most existing collaboration benchmarks are discrete or do not consider physical attributes and constraints. To address this, we introduce Moving Out, a human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and coordinating actions to move an item around a corner. Moving Out consists of two challenges and human-human interaction data to comprehensively evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes. To give embodied agents the capability to collaborate with humans under physical attributes and constraints, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. We systematically compare BASS and state-of-the-art models in AI-AI and human-AI experiments, showing that BASS can effectively collaborate with both unseen AI and humans. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.

25.
arXiv (CS.AI) 2026-06-17

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

arXiv:2606.17266v1 Announce Type: new Abstract: Production planning increasingly has to treat workforce capability as a decision variable: certifications lapse when skills are not maintained, new products require skills the current workforce does not hold, and reskilling competes for the same worker hours needed for production. Existing operations benchmarks usually treat labor as exogenous, while workforce-planning models with skills and learning are rarely released as reusable testbeds. We introduce SkillChain-Gym, a benchmark specification for reskilling-aware production-inventory control: a single-site environment with stylized worker skill-state dynamics, hard threshold certification, forgetting, and capacity-consuming training actions constrained by the same per-worker time budget as production. The benchmark includes seed-controlled disruption scenarios, three feasibility modes with projection diagnostics, deterministic replay, and metrics covering operations, resilience, capability growth, and training-access distribution. We evaluate production-only, reactive adaptive, water-filling adaptive, and static-insurance policies with budget variants over 60-shift horizons with paired statistical tests. The results are regime-dependent rather than a ranking. Training-capable policies dominate the production-only baseline, and maintenance training is necessary under forgetting even without disruptions. Among training-capable classes, adaptive training helps when bottlenecks are visible in the forecast, while a lean static cross-training plan, a deliberately favorable comparator whose structure encodes relevant skill contingencies, acts as strong insurance under surprise shocks and absenteeism. Capacity slack and the forgetting rate govern the boundary between these regimes. No policy class dominates across regimes, motivating forecast-driven controllers that decide when to buy skill insurance and when to react.