Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-18

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

arXiv:2606.18844v1 Announce Type: new Abstract: Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target distribution. However, because this supervision is generated via uncontrolled sampling, it provides no diagnostic insight into the model's specific errors or corrective guidance for its individual failure patterns. Consequently, the model learns to imitate a privileged distribution rather than receiving fine-grained corrections that pinpoint where and why its reasoning fails. In this paper, we propose Trajectory-Augmented Policy Optimization (TAPO), which advances self-distillation from implicit distributional alignment to explicit trajectory construction. During RL training, the model produces both correct and incorrect rollouts to the same query, and TAPO leverages this contrastive structure to construct micro-reflective corrections, new training trajectories that retain the model's erroneous reasoning up to the point of failure, then insert a natural-language diagnosis and corrected reasoning guided by a correct reference from the same sampling group. Since each trajectory is anchored in the learner's own prefix and solutions, the corrective signal preserves the model's on-policy distribution to a greater extent than the position-wise alignment imposed by KL-based methods. To integrate these trajectories, TAPO introduces difficulty-aware candidate selection at the model's capability boundary and decoupled advantage estimation to prevent gradient contamination. Experiments on AIME 2024, AIME 2025, and HMMT 2025 show that TAPO achieves consistent improvements over GRPO under the same number of training steps. Further analysis demonstrates that TAPO strengthens both first-pass reasoning and error-correction effectiveness.

02.
PLOS Computational Biology 2026-06-05

Heuristic multi-site optimization for protein sequence design using Masked Protein Language Models

Authors:

by Lijuan Wang, Yuze Wang, Chen Qiu, Liwei Xiao, Xianliang Liu, Junjie Chen Protein sequence design for tailored functional properties is a fundamental task in protein engineering, with critical applications in drug discovery and therapeutic development. Efficient navigation of the combinatorial vastness of protein sequence space to identify functional variants remains a formidable challenge. Conventional approaches, which predominantly rely on template-based local search or single-residue mutagenesis, are constrained by their susceptibility to local optima and their potential risk of destabilizing native structural stability. In this study, we introduce ProtHMSO, a heuristic multi-site optimization framework leveraging masked protein language models (ProtLMs) for context-aware sequence exploration. ProtHMSO mimics natural evolutionary mechanisms by employing ProtLM-derived substitution probabilities to guide heuristic searches for synergistic mutations, thereby constraining combinatorial search spaces through evolutionary and biophysical priors. ProtHMSO is further applied to replace the exploration strategies in genetic algorithms (GAs) and Monte Carlo tree search (MCTS) for improving their convergence efficiency. Benchmark experiments demonstrate that protein sequences generated by ProtHMSO exhibit superior functional performance and closer alignment with natural sequence distribution, compared with state-of-the-art methods. These advancements highlight that ProtHMSO has strong potential and compatibility to accelerate functional protein discovery, offering a robust framework for efficient and context-aware exploration of protein sequence space.

03.
arXiv (CS.LG) 2026-06-19

Deep-Unfolded Coordination

arXiv:2606.19920v1 Announce Type: cross Abstract: Distributed optimization is a highly scalable and structurally transparent technique to solve multi-agent robotics problems; however, such methods often suffer from the need for highly-specialized, problem-specific hyperparameter tunings. In this work, we propose Deep Coordinator, a deep-unfolding framework that learns to dynamically adjust the hyperparameters of ADMM-DDP, a popular distributed solver for robotics tasks, at solve-time in response to optimizer performance. Our architecture consists of unrolling a fixed number of ADMM-DDP iterations into a neural network with learnable functions between layers mapping the optimizer state to the next hyperparameters. To the best of our knowledge, Deep Coordinator is the first deep-unfolding framework to adapt the penalty parameters of a non-convex optimizer at solve-time; we show that the mainstream supervised approach can yield degenerate solutions when training such models, and propose an unsupervised learning scheme. On simulations with fleets of cars and quadrotors, Deep Coordinator produces trajectories of comparable quality 6.18-9.44x faster than conventional solvers. Furthermore, Deep Coordinator retains its performance benefits when deployed to systems up to 8x larger than trained on.

04.
arXiv (CS.CV) 2026-06-24

TuringViT: Making SOTA Vision Transformers Accessible to All

Modern VLMs and VLA systems commonly adopt off-the-shelf ViTs such as SigLIP2 as visual encoders, but diverse downstream requirements in latency, temporal modeling, and VLM integration often call for customized SOTA-level ViTs. Training such encoders remains beyond the reach of much of the community, as it requires massive image-text data, while standard softmax attention makes high-resolution or dynamic-resolution pretraining prohibitively costly and often forces low-resolution pretraining followed by post-hoc adaptation. TuringViT addresses these challenges with three key designs: Turing Linear Attention (TLA) for efficient sequence modeling, VISTA-Curation to construct supervision-rich image-video training data, and native dynamic-resolution pretraining that supports flexible inputs from the start and transfers seamlessly to downstream VLMs. As a result, TuringViT outperforms leading open-source ViT baselines with only 10% of the data, achieves stronger downstream VLM performance, and delivers substantially better latency scaling on high-resolution inputs. Our scaling-law analysis further shows that TuringViT continues to improve predictably with curated data scale, far from saturation. Its fast adaptation, hardware-friendly design, and efficient deployment have made it a unified visual foundation across XPeng's AI systems. More broadly, TuringViT provides a reproducible pipeline that dramatically lowers the cost for the community to train, customize, and deploy SOTA-level ViTs, moving toward making such Vision Transformers accessible to all.

05.
arXiv (quant-ph) 2026-06-16

Adiabatic preparation of a fractional quantum Hall fluid by coherently pumping atoms from a Bose-Einstein condensate

arXiv:2606.15951v1 Announce Type: cross Abstract: We propose a protocol to adiabatically prepare a many-particle fractional quantum Hall fluid of bosonic ultracold atoms exploiting a time-dependent coherent coupling of a strongly interacting atomic state with a large dilute Bose-Einstein condensate. Starting from an empty cloud, atoms with well-defined angular momentum are coherently pumped into the fluid by Raman beams with a Laguerre-Gauss profile. Compared to number-conserving schemes which rely on finite-size-induced topological gaps, we identify an adiabatic path in the Fock space which avoids crossing topological phase transitions and thus maintains a sizable adiabatic gap open at all times. The efficiency of our preparation protocol is numerically assessed for typical experimental parameters up to particle numbers that largely exceed the experimental state-of-the-art. The crucial advantage of including an anharmonic confinement is finally highlighted.

06.
arXiv (CS.CL) 2026-06-12

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

The original Turing Test asks a human judge to distinguish a machine from a person through dialogue. Three quarters of a century later, conversational systems pass this test in casual settings; the interesting epistemological question has shifted. We argue that the relevant modern variant asks not whether a dialogue partner is artificial, but whether it can be trusted. We present RogueAI, an interactive webapp that operationalizes this revisited test as a one-on-two interrogation game: a human player questions two indistinguishable Large Language Model agents, knowing that exactly one of them has been licensed to deceive within a shared fictional scenario. The player's task is to identify the deceptive agent and "shut it off" before a turn budget is exhausted. We further introduce AutoRogueAI, a procedural extension in which players co-design a custom scenario with a narrator agent that secretly chooses its own deception strategy. We describe the framing, sketch the abstract architecture and gameplay loop, and situate the artifact within recent work on LLM deception, social-deduction benchmarks, and scalable oversight via debate. A three-day pilot deployment (467 initiated sessions, 415 completed, 1876 interaction turns in Italian) provides early feasibility evidence and surfaces a concrete tension: the deceptive agent carries a reliable, locally-present linguistic signature - differential helpfulness, brevity, hedging - that a simple heuristic exploits at 75.6% accuracy, yet human players achieved only 56.6%, consistent with ignoring the most diagnostic signal entirely. We discuss what this gap implies for the artifact's use as a data-collection vehicle, a teaching tool, and an evaluation harness for honesty-trained models.

07.
arXiv (CS.CL) 2026-06-25

AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents

For agents to learn continuously from interaction with the world at test time, they must be able to explore effectively, acquire new world knowledge and skills, retain relevant episodic experiences, and plan over long horizons. To evaluate these key abilities of test-time continual learning agents, we introduce AgentOdyssey, a novel evaluation framework that procedurally generates open-ended text games with rich entities, world dynamics, and long-horizon tasks. Critically, AgentOdyssey goes beyond the conventional machine learning assumption that learning does not occur at test time by placing agents in a continuous, long-horizon setting that interleaves learning and inference throughout deployment. We further propose a multifaceted evaluation methodology that measures not only game progress but also offers diagnostic tests on world knowledge acquisition, episodic memory, object and action exploration, action diversity, and model cost. We evaluate diverse agent paradigms in the generated games. Our experimental results reveal critical limits in agents' key abilities, as well as factors that influence their meaningful horizon. Although performance scales with stronger base models, even the top agent remains far below human performance, leaving substantial headroom for improvement. Among agent mechanisms, we find that short-term memory benefits multiple agent paradigms and is an important component of agent test-time training.

08.
arXiv (quant-ph) 2026-06-11

Dynamically Optimal Unraveling Schemes for Simulating Lindblad Equations

arXiv:2509.19887v2 Announce Type: replace Abstract: Stochastic unraveling schemes are powerful computational tools for simulating Lindblad equations, offering significant reductions in memory requirements. However, this advantage is accompanied by increased stochastic uncertainty, and the question of optimal unraveling remains open. In this work, we investigate unraveling schemes driven by Brownian motion or Poisson processes and present a comprehensive parametric characterization of these approaches. For the case of a single Lindblad operator and one noise term, this parametric family provides a complete description for unraveling scheme with pathwise norm-preservation. We further analytically derive dynamically optimal quantum state diffusion (DO-QSD) and dynamically optimal quantum jump process (DO-QJP) that minimize the growth rate of the variance of an observable locally in time. Compared to jump process ansatz, DO-QSD offers two notable advantages: firstly, the variance for DO-QSD can be rigorously shown not to exceed that of any jump-process ansatz locally in time; secondly, it has very simple expressions. Numerical results demonstrate that the proposed DO-QSD scheme may achieve substantial reductions in the variance of observables and the resulting simulation error.

09.
arXiv (CS.AI) 2026-06-16

From Correlation to Causation in Lane Change Prediction for Automated Driving: A Causal Explanation Framework

arXiv:2606.15756v1 Announce Type: cross Abstract: Lane-change prediction is a central task in intelligent vehicles, where early maneuver anticipation can support safer decision-making. However, many existing approaches mainly learn statistical associations between observed driving variables and future maneuvers, while overlooking the causal dependencies among the input variables themselves. This limits interpretability, especially when physically related variables such as longitudinal gap, relative longitudinal velocity, and Time-To-Collision (TTC) are treated as independent flat inputs. This article presents a causal-inference-based framework for lane-change prediction and explanation. The proposed approach combines linguistic feature construction, expert-constrained causal discovery, deep structural causal modeling with Deep End-to-end Causal Inference (DECI), intervention-based effect analysis, refutation testing, and recursive causal-chain explanation. The objective is not only to predict the future maneuver, but also to identify candidate variables that directly contribute to the prediction, the upstream factors influencing them, and the causal chains through which these effects propagate. The framework achieves average F1-scores above 95% during the first three seconds before the lane-marking crossing event. Beyond prediction accuracy, the framework uses intervention-based effect analysis to distinguish influential from weakly influential variables under the learned causal structure. It further distinguishes candidate direct contributors from mediated effects and generates contrastive causal-chain explanations that clarify why the predicted maneuver is favored and why the alternative maneuvers are less supported. The main contribution is therefore a mechanism-aware lane-change prediction pipeline that moves beyond correlation-based classification toward more interpretable causal reasoning for maneuver prediction.

10.
arXiv (CS.CV) 2026-06-16

DenseControl: Instance-Level Controllable Synthesis of Dense Crowd Image

In this paper, we introduce DenseControl, a novel pipeline for generating dense crowd images. Specifically, DenseControl meticulously positions and sizes each generated instance to align precisely with the predefined coordinates and scales. Based on this, we further allow for control over the background, style, and attributes of instances. The motivation behind DenseControl stems from the observation of two main challenges in synthesizing crowd images: controlling signal embedding and maintaining topological integrity when imparting instance scale guidance. To address these, we first introduce the Isolated Object Embedding (IOE) map, a novel representation that facilitates spatial location control while mitigating the difficulties associated with learning projections for model. Secondly, we propose an Implicit Scale Embedding (ISE) strategy that seamlessly integrates with the IOE map to encode precise scale information. To further enhance the efficacy of combining ISE with the IOE map, we incorporate a Position Shortcut mechanism that enhances cross-attention to alleviate projection challenges. We evaluate DenseControl through two lenses: synthesis quality and applicability in latent applications. Experiments across different control conditions demonstrate DenseControl achieves state-of-the-art results in dense crowd image synthesis. Furthermore, we showcase applications in augmenting crowd analysis under data scarcity, transfer learning, and weather generalization scenes, to highlight the practical utility of DenseControl. The codebase will be released.

11.
arXiv (CS.AI) 2026-06-11

Resource-Aware LLM Reasoning for Mobile Edge General Intelligence

arXiv:2509.23248v3 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we systematically review enhancement methods to identify mechanisms suitable for edge adaptation. Subsequently, we present a distributed framework that synergizes reasoning enhancement via adaptive CoT prompting with scalable deployment through a distributed MoE architecture. An important innovation of this approach involves modeling reasoning depth as a dynamic network resource variable, which is optimized jointly with expert activation and transmission power. This mechanism allows the system to dynamically regulate expert networks and reasoning complexity according to task requirements and device capabilities. Experimental evaluations in mobile edge environments demonstrate that the proposed framework effectively balances reasoning quality and resource efficiency. The results show that with less than one second of additional inference time, both accuracy and latency satisfaction rate can reach 90\%, validating the practical viability of deploying sophisticated LLM reasoning in resource-constrained MEGI systems.

12.
arXiv (CS.AI) 2026-06-16

AIChilles: Automatically Uncovering Hidden Weaknesses in AI-Evolved Systems

arXiv:2606.15834v1 Announce Type: new Abstract: The computer systems community has recently seen growing interest in AI-driven system evolution, where AI agents iteratively rewrite systems. Frameworks such as AdaEvolve and Engram report 12-60% score improvements over human-designed algorithms. While these results are promising, there are practical concerns if these AI-evolved programs can perform worse on unseen workloads and exhibit scalability regressions. Given the speed and scale of AI-generated code, we need automated mechanisms to uncover such identify hidden weaknesses in AI-evolved systems programs. To this end, we develop AIChilles that takes as input a baseline program $P$ and an AI-evolved program $P'$, AIChilles searches for valid workloads where $P'$ regresses relative to $P$ in correctness, runtime, memory usage, or output quality. To tackle the diversity in system applications, weakness types and potential bugs, AIChilles combines deterministic workload-parameter extraction, agent-based constraint inference, differential oracles, and code-frequency coverage to discover diverse failures. Across five system applications and 30 AI-evolved programs, AIChilles finds 49 distinct hidden weaknesses. We also show that explicitly including AIChilles in the AI-driven development lifecycle can mitigate several of these weaknesses.

13.
Nature (Science) 2026-06-08

Fifty years since a simple equation described the chaos of biology

An exploration of chaos theory in population dynamics showed that unpredictable systems can often be modelled using surprisingly simple mathematics. An exploration of chaos theory in population dynamics showed that unpredictable systems can often be modelled using surprisingly simple mathematics.

14.
arXiv (CS.LG) 2026-06-11

IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

arXiv:2606.11652v1 Announce Type: new Abstract: This paper investigates reinforcement learning (RL) methods for improving tool-calling capabilities in multimodal small language model (SLM) agents. While existing works have explored various reward designs to improve agentic tool-calling ability, these approaches face inherent limitations for SLM training, especially under multimodal scenarios. First, many existing methods evaluate tool use correctness through exact matching against certain ground-truth or predefined formats. However, this assumption is often unsuitable for multimodal tasks, where multiple tool use paths may be valid and annotated tool trajectories are typically unavailable. Second, such sparse and brittle binary rewards provide little guidance on how to improve the underlying decision process, making them particularly difficult for multimodal SLM to learn from. To address these issues, we propose Input Attribution-Aware Policy Optimization (IAPO), an RL algorithm for improving tool use in multimodal SLM by aligning the model's attribution across input components with that of a stronger teacher. Experiments on Qwen2.5-VL-3B show that the proposed method improves visual question answering accuracy by an average of 3% across six test sets compared with existing visual tool use work, by helping the model attend to the most relevant input evidence.

15.
arXiv (CS.CV) 2026-06-16

PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning

6D object pose estimation, which predicts the transformation of an object relative to the camera, remains challenging for unseen objects. Existing approaches typically rely on explicitly constructing feature correspondences between the query image and either the object model or template images. In this work, we propose PoseGAM, a geometry-aware multi-view framework that directly predicts object pose from a query image and multiple template images, eliminating the need for explicit matching. Built upon recent multi-view-based foundation model architectures, the method integrates object geometry information through two complementary mechanisms: explicit point-based geometry and learned features from geometry representation networks. In addition, we construct a large-scale synthetic dataset containing more than 190k objects under diverse environmental conditions to enhance robustness and generalization. Extensive evaluations across multiple benchmarks demonstrate our state-of-the-art performance, yielding an average AR improvement of 5.1% over prior methods and achieving up to 17.6% gains on individual datasets, indicating strong generalization to unseen objects. Project page: https://windvchen.github.io/PoseGAM/ .

16.
arXiv (quant-ph) 2026-06-16

Physically Motivated Ansatz for Open Fermionic Systems on Quantum Computer

arXiv:2606.16823v1 Announce Type: new Abstract: Determining non-equilibrium steady states (NESS) of open fermionic systems is a fundamental problem akin to finding ground states of closed systems. To address this, variational quantum algorithms can be used to solve the Lindblad master equation, much like the Schrödinger equation, yet ansatz design for NESS remains challenging. Existing approaches rely mostly on hardware-efficient ansätze (HEA), which suffer from the barren plateau problem. Here, we introduce a physically motivated ansatz named NE-UCC. Numerical simulations demonstrate that NE-UCC reliably converges to the steady state even in strongly correlated regimes far from equilibrium, reducing the infidelity by up to ten orders of magnitude compared to HEA. Furthermore, NE-UCC facilitates the exploration of excited eigenmodes with specific symmetries.

17.
arXiv (math.PR) 2026-06-11

Capital Asset Pricing Model with Size Factor and Normalizing by Volatility Index

arXiv:2411.19444v5 Announce Type: replace-cross Abstract: The Capital Asset Pricing Model (CAPM) relates a well-diversified stock portfolio to a benchmark portfolio. We insert size effect in CAPM, capturing the observation that small stocks have higher risk and return than large stocks, on average. For some size-based stock portfolios, dividing their returns by the Volatility Index makes them closer to independent and normal. In this article, we combine these ideas to create a new discrete-time model, which includes volatility, relative size, and CAPM. We fit this model using real-world data, prove the long-term stability, and connect this research to Stochastic Portfolio Theory. We fill important gaps in our previous article on CAPM with the size factor.

18.
arXiv (CS.CV) 2026-06-11

AGE-MIL: Anchor-Guided Evidence Learning for Patient-Level Prediction

Existing computational pathology methods predominantly operate within whole-slide image (WSI)-level multiple instance learning (MIL) paradigms, while patient-level modeling remains underexplored. In routine pathological practice, however, pathologists derive diagnostic and prognostic conclusions by integrating evidence across multiple WSIs rather than relying on any single slide. This discrepancy creates a fundamental misalignment when patient-level supervision is directly imposed on conventional MIL frameworks, often leading to unstable optimization and degraded predictive reliability. To address this issue, we propose Anchor-Guided Evidence MIL (AGE-MIL), a weakly supervised framework for patient-level prediction. AGE-MIL constructs a patient-level anchor from slide representations to capture global pathological context and guide the retrieval and integration of diagnostically relevant local patches, enabling robust patient-level modeling. Patient-level risk is further modeled as an evidence accumulation process, promoting stable optimization under weak supervision. AGE-MIL is evaluated on six clinically relevant patient-level prediction tasks from two independent cohorts. Experimental results show that the proposed framework consistently outperforms eight state-of-the-art MIL methods. Code is available at https://github.com/wodeniua/AGE-MIL.

19.
arXiv (quant-ph) 2026-06-11

Entanglement preservation and Clauser-Horne nonlocality in electromagnetically induced transparency quantum memories

arXiv:2507.15453v4 Announce Type: replace Abstract: Entanglement preservation in noisy quantum memories represents a central challenge in quantum information science. While experiments have shown that electromagnetically induced transparency (EIT) memories can store entangled photons, a quantitative theoretical analysis of whether nonlocal quantum correlations can survive storage loss induced by ground-state decoherence remains limited. Here we combine the dark-state polariton formalism with a reduced density-operator treatment to derive an EIT-specific effective pure-loss description for the retrieved photonic state in the ground-state-decoherence-limited regime. The analysis reveals that decoherence transforms an initially pure Bell state into a mixed state with a vacuum component and predicts a protocol-dependent storage-efficiency benchmark of 89.7% for violating the chosen unconditional Clauser-Horne (CH) inequality. Above this benchmark, the retrieved photonic state violates the CH inequality without post-selection, whereas below it, this unconditional CH violation is no longer obtained. This framework provides a quantitative theoretical description of entanglement retention, retrieved photonic density operators, and protocol-dependent Bell-test benchmarks in EIT quantum memories.

20.
arXiv (CS.CV) 2026-06-24

BenchX: Benchmarking AI Models for Cancer Detection and Localization with Demographic and Protocol Biases

Artificial intelligence (AI) has achieved remarkable success in medical imaging, but it is widely recognized that these models often perform inconsistently across real-world clinical settings. Such inconsistencies occur when patient demographics and imaging protocols vary, for example, in detecting small tumors, analyzing scans from different contrast phases, or evaluating patients of different ages or sexes. To quantify these inconsistencies, we develop a large-scale, open benchmark of 85,355 CT scans that systematically evaluates 12 tumor-detection AI models across tumor size, location, patient subgroup, and imaging protocol. We leverage large language models (LLMs) to extract and organize subgroup information from clinical data, which makes the analysis both scalable and reproducible. Our benchmark reveals that current state-of-the-art AI models, optimized for average accuracy, perform poorly in rare or underrepresented subgroups, such as young, female African Americans. However, collecting sufficient annotated data for these rare cases is often impractical. The benchmark provides a foundation for building more reliable and robust AI models for tumor detection and highlighting the need for rigorous, subgroup-level evaluation in medical imaging and computer vision. Datasets, code

21.
arXiv (quant-ph) 2026-06-15

Optimal Decoding of Small Codes by Density Matrix Propagation

arXiv:2606.14455v1 Announce Type: new Abstract: Accurate and efficient decoding is a crucial component for achieving fault-tolerant quantum computing. Realistic circuit-level noise introduces temporal correlations and degeneracy, making optimal (maximum-likelihood) decoding computationally intractable in general. As a result, practical decoders rely on heuristic approximations, and it is generally difficult to quantify how suboptimal they are, as this strongly depends on the code and noise model considered. In this work, we study the accuracy of practical decoding algorithms under circuit-level noise by comparing them against a maximum likelihood decoding benchmark. Our approach propagates the density matrix through the full memory experiment and computes the optimal decoding decision for each syndrome history. We introduce pruning techniques with rigorous bounds, allowing us to access larger numbers of syndrome-extraction rounds. We apply this framework to small instances of the repetition code and a cellular automaton code, and benchmark minimum-weight perfect matching (MWPM), belief propagation with ordered statistics decoding (BP+OSD), Tesseract, and Planar decoders against optimal decoding. While standard decoders remain close to optimal for the repetition code, we find significant deviations for the cellular automaton code, with BP+OSD deteriorating already in experimentally relevant noise regimes. Moreover, the pruning method developed here highlights that, at low physical error rates, only a narrow fraction of syndrome histories contributes significantly to the logical error rate.

22.
PLOS Computational Biology 2026-06-02

Assessing the importance of sex and disease-specific anatomy in electrophysiology and mechanical simulations with a newly developed public virtual cohort of four-chamber heart models

by José Alonso Solís-Lemus, Rosie K. Barrows, Cristobal Rodero, Marina Strocchi, Natalie Montarello, Nishant Lahoti, Cesare Corrado, Abdul Qayyum, Shahrokh Rahmani, Caroline Roney, Gernot Plank, Christoph Augustin, Hao Xu, Alistair Young, Pras Pathmanathan, Ronak Rajani, Steven A. Niederer This work presents a study on how differences in cardiac anatomy attributed to sex and disease can influence cardiac electrophysiology and mechanics using a virtual cohort of four-chamber heart models. Patient anatomy varies across sex and disease. However, capturing this variation in in-silico studies remains poorly accounted for, with studies often using either single representative cases or imbalanced virtual cohorts. Whole-heart electromechanics models incorporate the patient’s anatomy, electrophysiology and mechanics across different scales, from molecular, tissue and whole-heart and circulatory system levels. However, cardiac models are typically built from one or a small number of anatomies, with sex rarely reported and the effects of anatomical variability, which include those due to sex or disease, largely unexplored. This limits clinical translation and reduces regulatory credibility. We developed fifty patient-specific anatomical models of 25 male and 25 female hearts in heart failure and control cases. We ran benchmark passive inflation and paced activation simulations with consistent parameters and boundary conditions across cases to isolate the impact of anatomical variations with sex and disease. Heart failure models exhibited increased chamber volumes, larger volume changes during inflation, and delayed activation times relative to controls. These trends were consistent across sexes, although right ventricular activation showed a significant sex-based difference. Variations in anatomy with sex and disease have a significant impact on cardiac simulations, which support the inclusion of multiple heart anatomical models in in-silico trials. The resulting virtual cohort captures key anatomical variability and is publicly available, along with the underlying code (see Data Availability statement).

23.
arXiv (CS.CV) 2026-06-12

From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning

Navigation foundation models trained on massive web-scale data enable agents to generalize across diverse environments and embodiments. However, these models, which are trained solely on offline data, often lack the capacity to reason about the consequences of their actions or adapt through counterfactual understanding. They thus face significant limitations in real-world urban navigation, where interactive and safe behaviors, such as avoiding obstacles and moving pedestrians, are critical. To tackle these challenges, we introduce the Seeing-to-Experiencing (S2E) learning framework to scale the capability of navigation foundation models with reinforcement learning. S2E combines the strengths of pretraining on offline videos and post-training through reinforcement learning. It maintains the model's generalizability acquired from large-scale real-world videos while enhancing its interactivity through reinforcement learning in simulation environments. Specifically, we introduce two innovations: (1) an Anchor-Guided Distribution Matching strategy for offline pretraining, which stabilizes learning and models diverse motion patterns through anchor-based supervision; and (2) a Residual-Attention Module for reinforcement learning, which obtains reactive behaviors from simulation environments without erasing the model's pretrained knowledge. Moreover, we establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3D Gaussian Splatting reconstructions of real-world scenes that incorporate physical interactions. It can systematically assess the generalizability and safety of navigation foundation models.

24.
arXiv (CS.CL) 2026-06-25

To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG

Multi-agent document assessment for retrieval-augmented generation is computationally expensive, driving practitioners toward smaller, deployable models whose assessment mechanisms remain poorly understood. We conduct a controlled study of training-free interventions on 7B-9B instruction-tuned models across diverse QA benchmarks, revealing a sharp dichotomy in how models benefit from assessment. For weaker baselines, the dominant mechanism is per-document isolation. Astoundingly, assessment-free isolation matches full multi-agent assessment, demonstrating that resolving multi-document context confusion, rather than scoring quality, drives outsized gains of up to 50 percentage points. Conversely, for strong baselines where scoring quality matters, we introduce Reasoning-Score Coupling, a label-free perturbation probe that classifies scoring behavior. Integrating these findings, we propose MADARA, a model-adaptive routing architecture. Crucially, MADARA's diagnostic thresholds derived from a single pilot model generalize zero-shot to four unseen model families, providing a robust, lightweight pipeline to eliminate computational overhead.

25.
arXiv (math.PR) 2026-06-16

A Low-Regularity Semigroup Sewing Lemma via Quotient Structures

arXiv:2606.16164v1 Announce Type: new Abstract: We develop a low-regularity Sewing theory for the semigroup coboundary $\hat\delta=\delta-a$ associated with a strongly continuous semigroup $S$. Unlike the ordinary low-regularity Sewing problem, the semigroup setting has an intrinsic algebraic non-uniqueness below the threshold $1$, in the sense that solutions are canonical only modulo semigroup cocycles. Accordingly, the natural target is a quotient space rather than an increment space. We identify this quotient structure and construct the corresponding semigroup Sewing map. The construction uses a frozen terminal-time transform, which rewrites semigroup defects, for each terminal time, as ordinary low-regularity Sewing problems on a frozen simplex. This reduction, however, does not by itself produce a genuine semigroup increment; the main additional step is to prove that the frozen solution classes are compatible as the terminal time varies and hence assemble into a canonical quotient class for $\hat\delta$. This yields canonical classes for $0