Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-15

Machine-learned particle flow as a foundation model for collider physics

arXiv:2606.14373v1 Announce Type: cross Abstract: The workflow from particle collision to physics analysis passes through a series of reconstruction steps that are traditionally modular and disconnected, with no shared representation linking low-level detector data to high-level analysis tasks. We show that casting event reconstruction as a machine learning problem naturally produces such a shared representation. We repurpose a machine learning model trained for particle-flow reconstruction (MLPF) to perform three distinct analysis tasks: jet flavor identification, jet energy regression, and missing momentum regression. By appending the per-particle latent representations learned during reconstruction as additional input features, we substantially improve over baselines that use kinematic features alone. We further demonstrate that a single linear layer trained using only the latent representations achieves competitive performance against state-of-the-art baseline architectures, and outperforms the baseline for missing momentum regression with approximately 35 times fewer parameters. These results demonstrate that the latent representations learned during reconstruction encode essential physics information needed for downstream analysis, establishing MLPF as a foundation model and offering a concrete step toward an end-to-end pipeline from detector data to physics analysis.

02.
arXiv (CS.CV) 2026-06-19

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator–the depth predictor defining the constraint–is available during both training and inference. Existing approaches generally fall into two categories: supervised models that treat the conditioning signal as a static cue and ignore alignment information at inference, and guidance-based methods that consult it through hand-tuned linear updates, typically trading fidelity to the condition against the plausibility of the generated sample. We argue that the fundamental gap in both paradigms is that the model is never trained to utilize its own alignment error. We introduce FlowBender, a closed-loop framework that treats this error as a first-class input, training the network to learn a correction policy conditioned on inference-time feedback. At each step, an unguided look-ahead pass estimates the clean signal, a task-specific deviation is computed via the forward operator, and a refinement pass consumes this signal to produce a corrected velocity. We propose several variants of FlowBender, including a gradient-based formulation for differentiable operators and a zero-order variant for non-differentiable settings such as JPEG compression. For efficient sampling, we introduce a prior-step shortcut that enables closed-loop correction at a minimal additional computational cost. Across image-to-image translation, restoration, and 3D mesh texturing, FlowBender consistently outperforms standard supervised baselines, alignment-loss-augmented training, and state-of-the-art inference-time guidance, improving fidelity and plausibility simultaneously rather than trading them against each other. Project page: https://flow-bender.github.io/

03.
arXiv (CS.CL) 2026-06-16

BALTO: Balanced Token-Level Policy Optimization for Hallucination Mitigation

Hallucinations remain a major obstacle to deploying large language models (LLMs) in knowledge-intensive settings, where generated responses must be faithfully grounded in provided evidence. Reinforcement learning (RL) is a promising direction for hallucination mitigation, but response-level faithfulness rewards suffer from a granularity mismatch: localized hallucinations can cause supported content to receive spurious penalties. Although recent work introduces fine-grained feedback such as claim-level verification and token-level rewards, unbalanced credit assignment can still induce length, verbosity, or optimization-noise biases. We propose BALTO, a Balanced Token-level Policy Optimization framework for hallucination mitigation. BALTO extracts checkable factual claims, verifies them against the reference context, and projects claim-level judgments to token-level labels. A balanced token-level credit assignment mechanism is introduced into the framework. This design redistributes probability mass from unsupported content toward faithful content, rather than suppressing the entire response. We systematically analyze the limitations of response-level rewards from a theoretical standpoint, and prove BALTO's advantages in training stability and optimization efficiency for hallucination mitigation. Experiments on ConFiQA, RAGTruth, and FinLLM-Eval show that BALTO achieves the highest faithfulness across all six model–benchmark settings and consistently outperforms existing post-training baselines in Q-Score, demonstrating a stronger faithfulness–informativeness trade-off.

04.
arXiv (math.PR) 2026-06-16

Testing for a Hidden Geometry in Random Graphs

arXiv:2606.16715v1 Announce Type: cross Abstract: We study the problem of detecting a faint geometric signal hidden in an otherwise random graph. Formally, we consider a hypothesis testing problem in which, under the null, the observed graph is an Erdős–Rényi random graph $\mathcal{G}(n,q)$, while under the alternative a random geometric graph $\mathcal{G}(k,q,d)$ is planted on $k\le n$ vertices. The planted subgraph is generated from independent random points on the unit sphere $\mathbb{S}^{d-1}$, with edges determined by latent geometric proximity and calibrated to have edge density $q$. Our goal is to characterize the statistical and computational limits of detecting this hidden geometry. We derive sharp information-theoretic lower bounds that identify regimes where detection is impossible and provide algorithms that achieve these limits whenever detection is feasible. We further investigate the computational complexity of the problem and determine when efficient polynomial-time tests exist. The model exhibits an easy–hard–impossible phase transition: some regimes allow efficient detection, others permit detection only with computationally intractable procedures, and still others render detection impossible even with unlimited computational power. As evidence for the computational barrier, we prove that all low-degree polynomial algorithms fail throughout the conjecturally hard regime, demonstrating a sharp gap between statistical and computational feasibility.

05.
arXiv (CS.LG) 2026-06-24

Target-Aware Linear Regression Under Distribution Shift

arXiv:2606.22775v2 Announce Type: replace-cross Abstract: Distribution shift between training and deployment is a pervasive challenge for modern AI systems. In many cases, the target marginals of covariates and response are known or specified through population-level observations, boundary conditions, properties of simulator configurations, or alignment-time distributional constraints. Such knowledge may provide valuable side information for regression estimation. We study this problem in the multivariate linear regression setting with a stable conditional mean $E[Y\mid X]$ across source and target, and identify the hybrid-loss estimator, which jointly incorporates both target marginals, as a benchmark target-aware estimator. Its direct computation, however, requires solving a coupled nonlinear optimization that is expensive at scale. Our main contribution is to develop and evaluate two computationally tractable alternatives: a constrained moment-matching estimator and a two-stage estimator that augments ordinary least squares with a calibration step. For all three estimators, we derive and compare closed-form asymptotic mean squared errors, yielding conditions under which the tractable alternatives match or closely approximate the hybrid benchmark, and regimes in which they do not. Monte Carlo experiments across three controlled shift regimes validate the theoretical results, investigate the accuracy-runtime tradeoffs among the three estimators, and translate into guidance on estimator choice. In particular, the two-stage estimator nearly matches the hybrid benchmark in the high signal-to-noise regime at essentially no additional cost, providing theoretical grounding for empirical observations in nonlinear settings.

06.
arXiv (CS.CV) 2026-06-18

Quantification of Uncertainty with Adversarial Models in Medical Image Segmentation

Reliable pixel-level uncertainty quantification holds the potential to transform clinical workflows by enabling high-fidelity longitudinal monitoring and distinguishing true pathological changes from artifacts. Ideally, these models provide the stability required for critical treatment planning and surgical intervention. However, standard deep learning models often suffer from miscalibration, yielding overconfident predictions that mask underlying vulnerabilities at subtle pathological boundaries. To address this, we propose QUAM-SM, a post-hoc framework using targeted adversarial search to identify "adversarially fragile" pixels. By actively seeking perturbations that expose predictive instability, our method highlights regions where decisions are most vulnerable to being flipped. Importantly, the framework disentangles epistemic uncertainty from aleatoric uncertainty. Experiments on two public datasets with multiple expert annotations demonstrate that QUAM-SM outperforms both standard and recent uncertainty estimation approaches in terms of reliability and boundary sensitivity. Code is available at https://github.com/HanaJebril/quam_sm

07.
arXiv (math.PR) 2026-06-24

On domains of elliptic operators with distributional coefficients

arXiv:2509.24950v2 Announce Type: replace-cross Abstract: We show how one can use recently gained insights from the study of singular SPDEs, more particularly the study of singular operators via the theory of Paracontrolled Distributions, to construct domains for (singular) elliptic operators. Formally we consider \[ A (u) = (1 - \Delta) u + \nabla V \cdot \nabla u + \xi u + {{div} (\rho u)}, \] where $V \in \mathcal{C}^{\delta}$, $\xi \in \mathcal{C}^{- 2 + \delta}$, $\rho \in \mathcal{C}^{- 1 + \delta}, {div} \rho = 0$} and which satisfy a structural assumption that is notably satisfied when $\xi$ is a sub-critical noise, see {[MvZ22]}. We also show that under this assumption, one can construct a continuous change of variables $\Theta$ which satisfies \[ A \Theta - (1 - \Delta) \in \mathcal{L} (H^{2 - \delta''} ; H^{\delta'}) \] which allows us to define $A$ rigorously and parametrise a domain. Moreover, for suitably regularised operators \[ A_{\varepsilon} (u) := (1 - \Delta) u + \nabla V_{\varepsilon} \cdot \nabla u + (\xi_{\varepsilon} + c_{\varepsilon}) \cdot u + {{div} (\rho_{\varepsilon} \cdot u)}, \] we show that for a strongly converging regularised change of variables $\Theta_{\varepsilon} \rightarrow \Theta$ we have \[ A_{\varepsilon} \Theta_{\varepsilon} \rightarrow A \Theta in \mathcal{L} (H^2 ; L^2) \] which in particular implies norm resolvent convergence to a limiting closed operator. Finally, we give a class of examples and show how to apply these results to prove strong analytical local well-posedness for a singular Schrödinger equation formally given by \[ i \partial_t u + (1 - \Delta) u + \nabla V \cdot \nabla u + \xi \cdot u = - | u |^2 u \] for singular $V, \xi$ and that its solution is the limit of the solution of the classical solutions of a regularised equation

08.
Nature (Science) 2026-06-18

Daily briefing: The brain builds a sentence neuron by neuron

作者:

Researchers have tracked the electrical activity of individual brain cells during conversation in real time. Plus, the history of GPS and a cross-species transplant that could reveal clues about the origin of animals. Researchers have tracked the electrical activity of individual brain cells during conversation in real time. Plus, the history of GPS and a cross-species transplant that could reveal clues about the origin of animals.

09.
arXiv (CS.CV) 2026-06-19

Exploring Multi-Modal Large Language Models and Two-Stage Fine-Tuning for Fashion Image Retrieval

Composed image retrieval retrieves a target image using a composed query of a reference image and a modified text description. In the fashion domain, this task requires understanding subtle attribute variations such as color, pattern, and texture. However, existing approaches face limitations due to scarce annotated data and simplistic negative sampling. We propose a novel framework that integrates a multi-modal large language model (LLaVA) to generate attribute-aware triplets and introduces a two-stage fine-tuning strategy to enhance contrastive learning. We leverage pretrained vision-language models, such as CLIP-ViT/B32, to generate and concatenate sentence-level prompts with the relative caption and to scale the number of negatives using static representations. Experimental results demonstrate enhanced compositional reasoning and improved fine-grained retrieval behavior, underscoring the feasibility and potential of the proposed framework for fashion retrieval.

10.
arXiv (CS.AI) 2026-06-24

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

arXiv:2606.24551v1 Announce Type: new Abstract: Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark of 440 desktop tasks across 18 applications and 12 workflow categories, where screen-only GUI agents and skill-mediated CLI agents receive identical goals, states, and final-state verifiers while being restricted to modality-native actions. In this controlled setting, the strongest GUI agent reaches a 59.1% full pass rate, outperforming the strongest original-skill CLI agent at 48.2%; however, verifier-guided skill augmentation raises CLI success to 69.3%, showing that much of the CLI deficit comes from incomplete skill coverage rather than model capability alone. These results suggest that GUI and CLI expose different execution bottlenecks: GUI agents are limited by reliable grounded interaction over long-horizon workflows, whereas CLI agents are limited by the coverage and scalability of their skill interfaces.

11.
arXiv (math.PR) 2026-06-12

Temporal Conductance and Bounds on the Voter Model for Dynamic Networks

arXiv:2606.13374v1 Announce Type: cross Abstract: The voter model is a classical stochastic process that models how opinions might spread through a network: at each step, every node lazily adopts the opinion of a random neighbour; eventually all nodes share the same opinion (consensus). Stronger connectivity should yield faster consensus. Berenbrink, Giakkoupis, Kermarrec, and Mallmann-Trenn (ICALP 2016) make this precise via the network's conductance: if the network has $m$ edges, minimum degree $d_{\min}$, and conductance at least $\phi$, then the voter model reaches consensus in expected $O(m/(d_{\min}\phi))$ steps. Their results extend to dynamic networks with fixed vertex degrees by considering the network's conductance at each time step. We introduce temporal conductance $\Phi$, a more general connectivity measure for dynamic networks. Unlike static conductance, which collapses to $0$ whenever some snapshot is disconnected, $\Phi$ captures connectivity through edges that appear at different times. We generalise the results of Berenbrink et al. from static conductance to temporal conductance, showing that the expected consensus time of the standard voter model is at most $O(m/(d_{\min}\Phi))$. Moreover, we prove that this bound is tight up to constant factors. We expect temporal conductance to be a useful primitive for analysing other dynamics on temporal networks, and potentially time-inhomogeneous Markov chains more generally.

12.
Science (Express) 2026-05-21

Observation of quantum vortex core fractionalization and skyrmion formation in a superconductor | Science

作者: 未知作者

Magnetic fields can penetrate a superconductor in the form of quantum vortices, which consist of a core singularity with circulating currents. London’s quantization implies that there is one core singularity per quantum of magnetic flux in single-component superconductors. Here, we report signatures of quantum vortex core fractionalization on the potassium-terminated surface of a multiband superconductor KFe 2 As 2 . The observed splitting of single integer-flux vortices into several fractional vortices results in a disparity between the numbers of flux quanta and vortex cores. These fractional vortices often arrange in chains, which calculations show are characterized by a ℂP 2 skyrmionic topological invariant; this constitutes a different type of topological defect: the chiral skyrmion. The disparate natures of integer and fractional vortices comprising skyrmions lead to distinct spectroscopic signatures.

13.
arXiv (CS.LG) 2026-06-12

Design Criteria for SGD Preconditioners: Local Conditioning, Noise Floors, and Basin Stability

arXiv:2511.19716v2 Announce Type: replace-cross Abstract: Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.

14.
arXiv (quant-ph) 2026-06-15

Quantum codes and optimal pure quantum $(r,\delta)$-LRCs via the MP construction

arXiv:2606.14253v1 Announce Type: new Abstract: In this paper, we employ MP codes whose defining matrices are $\tau$-optimal defining ($\tau$-OD) matrices to construct new quantum codes and quantum $(r,\delta)$-LRCs. Specifically, we report the following results: We establish a unified $\tau$-monomial decomposition theorem for invertible self-adjoint matrices over finite fields of arbitrary characteristic, which generalizes the result in "Quantum codes using the $\tau$-OD MP construction" where the characteristic was required to be odd. Based on this theorem, we prove the existence of $\tau$-OD matrices over $\mathbb{F}_{q^2}$ for any characteristic and demonstrate that there exist several new infinite families of $\tau$-OD matrices over $\mathbb{F}_{q^2}$ of characteristic $2$. As an application of MP codes involving $\tau$-OD matrices, we construct several infinite families of quantum codes with flexible parameters. Within this framework, we present $222$ record-breaking quantum codes that surpass the best-known records maintained in Grassl's database. We propose two effective schemes for constructing optimal pure quantum $(r,\delta)$-LRCs via MP codes. Accordingly, we construct four new infinite families of optimal pure quantum $(r,\delta)$-LRCs with flexible parameters. Notably, we report an interesting phenomenon by exhibiting $30$ optimal pure quantum $(r,\delta)$-LRCs derived from our framework; that is, there exist quantum codes that are not only optimal pure quantum $(r,\delta)$-LRCs but also, according to Grassl's database, best-known, optimal, or record-breaking quantum codes. To the best of our knowledge, the new discovery that quantum codes are simultaneously optimal pure quantum $(r,\delta)$-LRCs and record-breaking quantum codes has not been previously reported in the literature.

15.
arXiv (quant-ph) 2026-06-24

Infinite-Level Hierarchy of Solvable Quantum Circuits

arXiv:2606.23803v1 Announce Type: new Abstract: Dual-unitary circuits have emerged as a paradigm of exactly solvable yet non-integrable quantum dynamics. Recently, a generalization of dual unitarity attempting to extend the phenomenology of exactly solvable circuits has been introduced through a hierarchy of conditions, with dual unitarity as the first level. However, beyond the second level the proposed generalized dual-unitary hierarchy ceases to be solvable in the whole spacetime. We present an infinite hierarchy of solvability conditions remedying this problem. These new conditions can be combined with the generalized dual-unitary hierarchy to obtain circuits for which correlation functions and entanglement dynamics can be analyzed exactly in the whole spacetime. We show that this novel hierarchy possesses non-trivial solutions at every level. Our results demonstrate that dual unitarity can be systematically extended while preserving solvability, opening up investigations of exactly solvable non-integrable systems with more general properties.

16.
arXiv (math.PR) 2026-06-16

Plateau Gaps of Poisson Correctors Encode Metastable Reaction Rates

arXiv:2606.14789v1 Announce Type: cross Abstract: Metastable reaction rates are commonly inferred from transition-state fluxes, mean first-passage times, or fitted kinetic models. We show that they are directly encoded in the plateau gap of an occupation-time Poisson corrector. For a centered basin-occupation observable, the Poisson corrector develops metastable plateaus in the reactant and product basins, and their separation determines the forward and backward transition rates. This construction requires only the generator, stationary measure, and metastable partition, and therefore does not rely on a predefined transition-state surface. In overdamped and underdamped double-well dynamics, the plateau-gap rate recovers the Kramers, Grote-Hynes, and Pollak-Grabert-Hänggi hierarchy. The same corrector-martingale decomposition yields a reactive-noise density, revealing where stochastic forcing contributes to transitions in configuration or phase space. Thus, reaction rates and their fluctuation sources emerge from a single corrector field.

17.
arXiv (CS.AI) 2026-06-12

The Challenges of Balancing AI Compliance and Technological Innovations in Critical Sectors: A Systematic Literature Review

arXiv:2606.12423v1 Announce Type: cross Abstract: The rapid integration of artificial intelligence (AI) into critical infrastructure including healthcare, finance, energy, and defense, offers transformative benefits but also conflicts with evolving regulatory and governance frameworks. This paper presents a systematic literature review (SLR) to examine the challenges of balancing AI compliance and technological innovation across critical infrastructure sectors. The review follows established SLR guidelines to extract and synthesize insights from peer-reviewed articles, report, and institutional sources published between 2020-2025. The study identifies three interrelated challenges: fragmented regulations, excessive compliance burdens for smaller to medium enterprises (SMEs), and misaligned governance models. To address these challenges, the study highlights practical governance strategies, including risk-tiered regulation, compliance by design, and explainable AI, to support scalable and trustworthy AI deployment in critical sectors. Key contributions include a concise mapping of core AI-governance challenges and a conceptual diagram illustrating their overlap, as well as actionable strategies for policymakers and practitioner to harmonize oversight with innovation.

18.
arXiv (CS.CL) 2026-06-24

Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

Crowdsourced pairwise evaluation has emerged as a scalable approach for assessing foundation models. However, applying it to Text to Speech(TTS) introduces high variance due to linguistic diversity and multidimensional nature of speech perception. We present a controlled multidimensional pairwise evaluation framework for multilingual TTS that combines linguistic control with perceptually grounded annotation. Using 5K+ native and code-mixed sentences across 10 Indic languages, we evaluate 7 state-of-the-art TTS systems and collect over 120K pairwise comparisons from over 1900 native raters. In addition to overall preference, raters provide judgments across 6 perceptual dimensions: intelligibility, expressiveness, voice quality, liveliness, noise, and hallucinations. Using Bradley-Terry modeling, we construct a multilingual leaderboard, interpret human preference using SHAP analysis and analyze leaderboard reliability alongside model strengths and trade-offs across perceptual dimensions.

19.
arXiv (CS.LG) 2026-06-12

FedBiCross: Personalized One-Shot Federated Learning on Medical Images

arXiv:2601.01901v4 Announce Type: replace Abstract: Data-free knowledge distillation-based one-shot federated learning (OSFL) trains a model in a single communication round without sharing raw data, making OSFL attractive for privacy-sensitive medical applications. However, existing methods aggregate predictions from all clients to form a global teacher. Under non-IID data, conflicting predictions dilute each other during averaging, yielding less informative soft labels that weaken distillation. We propose FedBiCross, a personalized OSFL framework with three stages: (1) clustering clients by model output similarity to form coherent sub-ensembles, (2) bi-level cross-cluster optimization that learns adaptive weights to selectively leverage beneficial cross-cluster knowledge while suppressing negative transfer, and (3) personalized distillation for client-specific adaptation. Experiments on four medical image datasets demonstrate that FedBiCross consistently outperforms state-of-the-art baselines across different non-IID degrees.

20.
PLOS Computational Biology 2026-06-01

Challenges and progress in RNA velocity: Comparative analysis across multiple biological contexts

by Sarah Ancheta, Leah Dorman, Guillaume Le Treut, Abel Gurung, Greg Huber, Loïc A. Royer, Alejandro Granados, Merlin Lange Single-cell RNA sequencing is revolutionizing our understanding of cell state dynamics, allowing researchers to capture and quantify the transcriptomic profile of a single cell at a specific timepoint. Among the computational techniques used to predict cellular trajectories, RNA velocity has emerged as a predominant tool for modeling transcriptional dynamics. RNA velocity leverages the mRNA maturation process to generate velocity vectors that predict the likely future state of a cell, offering insights into cellular differentiation, aging, and disease progression. Although this technique has shown promise across biological fields, the performance accuracy varies depending on the RNA velocity method and dataset. We established a comparative pipeline and analyzed the performance of five RNA velocity methods on three datasets based on local consistency, method agreement, identification of driver genes, and robustness to sequencing depth. This benchmark provides a resource for scientists to understand the strengths and limitations of different RNA velocity methods.

21.
arXiv (CS.CL) 2026-06-24

ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement

Despite the remarkable performance of large language models (LLMs) in text-to-SQL (SQL generation), correctly producing SQL queries remains challenging during initial generation. The SQL refinement task is subsequently introduced to correct syntactic and semantic errors in generated SQL queries. However, existing paradigms face two major limitations: (i) self-debugging becomes increasingly ineffective as modern LLMs rarely produce explicit execution errors that can trigger debugging signals; (ii) self-correction exhibits low detection precision due to the lack of explicit error modeling grounded in the question and schema, and suffers from severe hallucination that frequently corrupts correct SQLs. In this paper, we propose ErrorLLM, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement. Specifically, we represent the user question and database schema as structural features, employ static detection to identify execution failures and surface mismatches, and extend ErrorLLM's semantic space with dedicated error tokens that capture categorized implicit semantic error types. Through a well-designed training strategy, we explicitly model these errors with structural representations, enabling the LLM to detect complex implicit errors by predicting dedicated error tokens. Guided by the detected errors, we perform error-guided refinement on the SQL structure by prompting LLMs. Extensive experiments demonstrate that ErrorLLM achieves the most significant improvements over backbone initial generation. Further analysis reveals that detection quality directly determines refinement effectiveness, and ErrorLLM addresses both sides by high detection F1 score while maintain refinement effectiveness.

22.
arXiv (CS.CL) 2026-06-15

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems

LLM-based multi-agent systems (MAS) have demonstrated significant potential in enhancing single LLMs to address complex and diverse tasks in practical applications. Despite considerable advancements, the field lacks a unified codebase that consolidates existing methods, resulting in redundant re-implementation efforts, unfair comparisons, and high entry barriers for researchers. To address these challenges, we introduce MASLab, a unified, comprehensive, and research-friendly codebase for LLM-based MAS. (1) MASLab integrates over 20 established methods across multiple domains, each rigorously validated by comparing step-by-step outputs with its official implementation. (2) MASLab provides a unified environment with various benchmarks for fair comparisons among methods, ensuring consistent inputs and standardized evaluation protocols. (3) MASLab implements methods within a shared streamlined structure, lowering the barriers for understanding and extension. Building on MASLab, we conduct extensive experiments covering 10+ benchmarks and 8 models, offering researchers a clear and comprehensive view of the current landscape of MAS methods. MASLab will continue to evolve, tracking the latest developments in the field, and invite contributions from the broader open-source community.

23.
arXiv (CS.CL) 2026-06-16

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

Hybrid linear attention models offer an appealing path to faster long-context inference: they reduce the quadratic cost and KV-cache burden of full softmax attention while retaining much of the quality of Transformer models. A practical way to obtain such models is to convert a pretrained Transformer instead of pretraining a new architecture from scratch, but this conversion is still brittle. Simply copying the teacher attention projections into a Gated DeltaNet (GDN) student does not specify the new recurrent decay, write, and output-gating dynamics. As a result, the converted model often starts in a poor dynamical regime and must spend many distillation tokens repairing initialization rather than learning the remaining teacher behavior. We propose Taylor-Calibrate, a lightweight initialization method for hybrid GDN students. The method uses Taylor-guided teacher attention statistics to set the value projection, memory timescale, write gates, and output gate, then applies a short per-layer alignment step to match each converted layer to the teacher output. Across four teacher settings and three retained-layer policies, Taylor-Calibrate gives substantially stronger zero-shot students, with up to an 88x improvement in a representative ablation, and reaches matched recovery targets with 4.9x–9.2x fewer training tokens than naive conversion.

24.
Nature Medicine 2026-06-09

Adjuvanted inactivated rabies virus-vectored Lassa virus vaccine in healthy adults: a phase 1 trial

Lassa fever causes substantial morbidity and mortality in West Africa, and no licensed vaccine is available. We evaluated LASSARAB, an inactivated rabies virus-vectored Lassa virus (Josiah strain) glycoprotein complex vaccine. We conducted a randomized, controlled, dose-escalation phase 1 trial. Participants (total n = 54) received two intramuscular doses of LASSARAB containing 700 (n = 15), 1,400 (n = 15) or 2,800 (n = 14) relative units of antigen formulated with the TLR-4 agonist 3D-6-acyl PHAD-SE adjuvant, or licensed rabies vaccine control (n = 10), administered 28 days apart. This protocol-defined interim analysis reports the primary safety evaluation and secondary immunogenicity assessments through day 61. There were no prespecified hypotheses or formal power calculations. All primary safety end points demonstrated an acceptable safety profile. After dose 1, local solicited adverse events occurred in 86.7–100.0% of LASSARAB groups and 80% of controls; systemic events in 33.3–71.4% and 60.0% of controls. After dose 2, local solicited adverse events occurred in 66.7–86.7% of LASSARAB groups and 55.6% of controls; systemic events in 53.3–71.4% of LASSARAB groups and 55.6% of controls. Events were predominantly mild and self-limited. Unsolicited adverse events occurred in 28.6–60.0% of LASSARAB groups and 20.0% of controls. No serious adverse event, immune-mediated condition or sensorineural hearing loss occurred. Safety laboratory abnormalities occurred in 13.3–66.7% of LASSARAB groups and 30.0% of controls (14 mild, 6 moderate and none severe). After two doses, Lassa virus GPC IgG ELISA seroconversion (≥fourfold rise) was achieved in 100.0% (44 of 44) of LASSARAB recipients and 0.0% (0 of 10) of controls. Rabies glycoprotein IgG ELISA seroconversion (≥fourfold rise) and neutralizing antibody by rapid fluorescent focus inhibition test (RFFIT) seroprotection (≥0.5 IU ml−1) were also 100% across all groups, including controls. LASSARAB + 3D-6-acyl phosphorylated hexaacyl disaccharide (PHAD)-SE demonstrated a favorable safety profile and immunogenicity against Lassa and rabies viruses. The per-protocol final study report will include safety and durability through day 394. ClinicalTrials.gov identifier NCT06546709 . An interim report of a first-in-human phase 1 trial found an adjuvanted, combination inactivated rabies-vectored, Lassa fever vaccine (LASSARAB + 3D-6-acyl PHAD-SE) to be safe and induced immunogenicity to both Lassa and rabies viruses in healthy participants.

25.
arXiv (CS.AI) 2026-06-16

OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

arXiv:2604.18827v2 Announce Type: replace-cross Abstract: Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.1 million neurons from the visual cortex of 73 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task models that support three regimes flexibly at test time: neural prediction, behavioral decoding, neural forecasting, or any combination of the three. OmniMouse achieves state-of-the-art performance, outperforming specialized baselines across nearly all evaluation regimes. We find that performance scales reliably with more data, but gains from increasing model size saturate. This inverts the standard AI scaling story: in language and computer vision, massive datasets make parameter scaling the primary driver of progress, whereas in brain modeling – even in the mouse visual cortex, a relatively simple system – models remain data-limited despite vast recordings. The observation of systematic scaling raises the possibility of phase transitions in neural modeling, where larger and richer datasets might unlock qualitatively new capabilities, paralleling the emergent properties seen in large language models. Code available at https://github.com/enigma-brain/omnimouse.