Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-15

Efficient Simulation of Szegedy Quantum Walk Formulations and Algorithms

arXiv:2606.14226v1 Announce Type: new Abstract: Quantum walks provide a versatile framework for quantum algorithms across a wide range of applications. We develop efficient classical simulation methods for Szegedy quantum walks that avoid explicit construction of the full unitary evolution operator. Unlike previous approaches restricted to a particular walk formulation, our framework is built from fundamental update and reflection operators, enabling the simulation of a broader class of Szegedy walk formulations. We further extend these methods to phase-estimation-based algorithms coupled to the walk, including implementations suitable for large sparse graphs. The resulting methods achieve optimal $O(N^2)$ complexity for dense graphs with $N$ nodes. For sparse graphs, the computational cost scales linearly with the number of edges, which is $O(N)$ in many cases. We implement the framework in the Python package SQWLib and illustrate its capabilities through simulations of representative algorithms, including quantum simulated annealing and quantum search on graphs. These results provide a practical tool for studying Szegedy-walk-based algorithms numerically beyond purely analytical treatments.

02.
arXiv (CS.AI) 2026-06-11

LLMs+Graphs: Toward Graph-Native, Synergistic AI Systems

arXiv:2606.11560v1 Announce Type: cross Abstract: Large Language Models (LLMs) have advanced rapidly, but their limitations in structured and multi-hop reasoning underscore the need for graph-native, synergistic artificial intelligence (AI) systems. Graph-structured data underpins critical applications across social, biological, financial, transportation, web, and knowledge domains, making it essential to understand how LLMs can leverage graph computation for grounded, context-rich inference. Three complementary synergies are emerging: LLMs augmented with graph computation for retrieval and reasoning; bidirectional integration between LLMs and knowledge graphs (KGs), where LLMs support KG construction and curation while KGs enforce semantic constraints and factual consistency; and AI agents strengthened by graph algorithms for planning, decision making, and multi-step reasoning. In parallel, LLMs introduce new capabilities for graph data management and graph machine learning (ML) through natural language interfaces and hybrid LLM-graph neural network (GNN) pipelines. This tutorial synthesizes the algorithms, systems, and design principles driving these converging directions, offering data science and data mining researchers a unified perspective on integrating LLMs, graph data management, graph mining, graph ML, and agentic computation into next-generation graph-native AI systems.

03.
Nature (Science) 2026-06-17

A blastoporal organizer in a ctenophore

In an iconic experiment in 1924, Hilde Mangold and Hans Spemann established that the dorsal blastopore lip of amphibian embryos functions as an organizer and induces a secondary body axis when transplanted into a host embryo1. This discovery demonstrated that specific embryonic regions can regulate embryonic patterning and lead to the establishment of an entire body axis. Subsequent studies have revealed that cnidarians, the sister group to Bilateria, also possess a blastoporal embryonic organizer2,3. However, the evolutionary origin of the organizer remains unclear. Here we report that the blastopore lip of the ctenophore Mnemiopsis leidyi, a member of the evolutionary sister group to all other metazoans4,5, exhibits organizer activity. We show that transplanted fragments of blastopore lip tissue from M. leidyi gastrula induce secondary pharynx and mouth formation. Moreover, transphyletic transplantation experiments show that the blastopore lip of M. leidyi leads to the generation of a secondary body axis in embryos of the cnidarian Nematostella vectensis. Organizer function in M. leidyi requires both β-catenin and TGFβ signalling, and the TGFβ-family ligands probably provide this inductive capacity. These findings reveal the deep homology of the blastoporal organizer in ctenophores, cnidarians and vertebrates, implying the ancestral organizer role of the blastopore lip. We propose that the emergence of the organizer was an essential innovation that facilitated the change from the temporal cell differentiation of unicellular relatives to the spatial cell differentiation of the first multicellular embryo. Experiments using the comb jelly Mnemiopsis leidyi and the sea anemone Nematostella vectensis reveal that the emergence of a core signalling pathway may have been a key innovation enabling the transition to multicellularity in animals.

04.
arXiv (CS.LG) 2026-06-17

Sign-Rank, Index, and List Replicability: Connections and Separations

arXiv:2606.18236v1 Announce Type: new Abstract: In learning theory, the sign rank of a binary concept class captures the smallest dimension in which it can be represented by points and halfspaces. Despite tremendous interest, lower bounds on sign rank are notoriously difficult to come by. Two recent approaches to the problem establish lower bounds on sign rank by measures that are easier to analyze: the $\mathbb{Z}_2$-index and the list replicability number. We order these measures, showing that the $\mathbb{Z}_2$-index is upper-bounded by a linear function of the list replicability number. As a main consequence, we obtain a strong separation between sign rank and $\mathbb{Z}_2$-index, thereby resolving a question of Frick, Hosseini, and Vasileuski. This motivates a thorough study of list replicability, the stronger of the two lower-bounding measures. We establish upper bounds on the list replicability number by two combinatorial measures: height and minimum star number. We also prove a fundamental composition result, showing that the product of two concept classes has list replicability number bounded by the sum of the list replicability numbers of the two classes.

05.
bioRxiv (Bioinfo) 2026-06-16

OmicOS: A Comprehensive Omics Ecosystem Infrastructure and Agent System for the AI Era

Biology has accumulated a vast ecosystem of omics methods, but much of this ecosystem remains built for expert humans rather than scientific agents. Methods are scattered across Python packages, R/Bioconductor and CRAN workflows, command-line tools, incompatible data containers and implicit object states, making even routine analyses difficult for an AI system to choose, execute and verify reliably. Here we introduce OmicOS, a comprehensive omics ecosystem infrastructure and agent system that turns OmicVerse V2, an open-source omics community, into an executable foundation for agentic biology. OmicVerse V2 provides the community substrate: scalable AnnDataOOM-compatible rust backends, agent-friendly Python algorithms for single-cell, spatial, bulk and multi-omics analysis, interfaces to single-cell foundation models, and Python-native reconstructions of historically R-centred Bioconductor/CRAN-style workflows. OmicOS makes this substrate actionable by registering analytical functions as state-aware capability contracts, allowing agents to inspect live data objects, select valid methods, execute controlled workflows and record provenance. The result is not a fixed pipeline, but a programmable omics environment in which agents compose real analyses from verified community methods rather than inventing tools. Across external and purpose-built benchmarks, OmicOS ranked first among the evaluated systems, reaching 81.2% on BiomniBench. Adding OmicVerse to a minimal agent improved task completion by up to 34.2 percentage points with qwen-3.6-35b, and controlled ablations showed that the gains came from registry-grounded execution rather than from larger models, documentation retrieval or unrestricted tool exposure. The same infrastructure scaled to atlas-sized data, reproduced R-centred workflows in Python and converted external pathology software into agent-usable skills. In a discovery task starting from a whole-body spatial map and the term Alzheimer disease, OmicOS composed a non-canonical workflow that integrated spatial expression, genetic association, eQTL and colocalization evidence to nominate a colon epithelial risk axis centred on PICALM, CD2AP and CR1. Together, OmicVerse and OmicOS define an open foundation for AI-era omics, showing how a community of biological methods can be transformed into a reliable, extensible and agent-operable system for discovery.

06.
arXiv (CS.LG) 2026-06-19

Spectral Retrieval-Augmented Time-Series Forecasting

arXiv:2606.19412v1 Announce Type: new Abstract: Time series forecasting leverages historical patterns to predict future values, but traditional methods face challenges when dealing with complex, non-stationary patterns that are difficult to memorize during training. Retrieval-augmented approaches have emerged as promising solutions by retrieving similar historical patterns to enhance predictions. However, existing retrieval methods suffer from two fundamental limitations: spectral blindness, which overlooks critical frequency-domain characteristics that capture underlying periodic structures, and temporal recency, which treats all historical data equally without emphasizing recent, more relevant patterns. In this paper, we propose SpecReTF, a novel retrieval method that addresses these issues by converting time series into windowed frequency representations, measuring similarity with a combined metric that captures both amplitude and phase information. To balance recency and historical context, we apply an exponential moving average weighting scheme that emphasizes recent windows. Extensive experiments on benchmark datasets demonstrate that SpecReTF outperforms time-domain retrieval methods, achieving superior forecasting accuracy across diverse, non-stationary time series.

07.
arXiv (quant-ph) 2026-06-12

Quantized time in quantum walks under weak rank-K measurements

arXiv:2606.13552v1 Announce Type: new Abstract: Measurements can be used to monitor the evolution of quantum systems and may lead to a universally quantized time statistics. It is known that the mean return time is quantized for strong and indirect monitoring through the winding number of the return amplitude in a one-dimensional space. Here we discuss that under multi-channel strong or indirect monitoring, where the latter is achieved through ancilla coupling, the mean return time of a quantum walk in the projected subspace is also quantized. This reflects a universal time quantization for a higher dimensional evolution.

08.
arXiv (CS.LG) 2026-06-11

Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents

arXiv:2606.11998v1 Announce Type: new Abstract: Trusted monitoring is a cornerstone of AI control. However, as frontier models grow more capable, the increasing capabilities gap between trusted and untrusted models may render trusted models unreliable monitors. We introduce bootstrapped monitoring, a protocol that addresses this by inserting a stronger, intermediate untrusted model with transparent chain-of-thought reasoning into the oversight chain. The untrusted monitor ($U_m$) evaluates the agent's actions, while a weaker trusted model ($T$) oversees $U_m$'s reasoning to detect collusion. We evaluate bootstrapped monitoring on multi-turn software engineering tasks (BashArena) across multiple agents and monitors. Bootstrapped monitoring substantially improves catch rates over trusted-only monitoring, even when the untrusted monitor actively colludes with the agent, provided we have access to its raw chain-of-thought. Our results suggest that bootstrapped monitoring can extend the useful lifetime of trusted models in control as AI capabilities advance.

09.
arXiv (CS.CV) 2026-06-17

Where Should Action Generation Begin? A Learnable Source Prior for Generative Robot Policies

Generative robot policies typically begin action generation from an observation-independent standard Gaussian distribution, leaving the choice of source distribution underexplored. This work asks a simple question: where should action generation begin? We propose LeaP, a Learnable source Prior that replaces the standard Gaussian with a proprioception-conditioned diagonal Gaussian over action chunks. Parameterized by a lightweight MLP, LeaP jointly predicts the mean and state-adaptive variance of the source distribution, while keeping the downstream generator architecture and inference solver unchanged. This design provides an observation-informed yet stochastic initialization, allowing the generator to focus on precise action refinement rather than transporting samples from an uninformed noise source. On 15 RoboTwin manipulation tasks, LeaP achieves an average success rate of 81.6%, outperforming four representative baselines – including deterministic-source methods, a no-prior counterpart, and a diffusion-bridge policy – by 6.5 to 25.5 percentage points. The same prior consistently improves both flow-matching and diffusion-bridge generators, while using fewer parameters and converging faster. The advantage carries over to real-world deployment, where LeaP attains the best performance. These results suggest that the source distribution is an independent and reusable design axis for generative robot policies, complementary to the choice of generative dynamics.

10.
arXiv (CS.LG) 2026-06-17

Perron–Frobenius Operator Matching for Generative Modeling

arXiv:2606.17465v1 Announce Type: new Abstract: We introduce Perron–Frobenius Operator Matching (PFOM), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models. We prove that among Bregman divergences, only Kullback–Leibler divergence preserves equality between density-level and sample-conditioned objectives, yielding a practical loss equivalent to Koopman path matching. We further develop Nesterov-accelerated training and sampling that stabilize discretization and accelerate convergence. %On Gaussian mixtures and two-moons, PFOM achieves faster KL/$W_2$/MMD decrease and improved wall-clock efficiency with empirical validation. PFOM unifies operator-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high-dimensional applications.

11.
arXiv (CS.CL) 2026-06-11

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents

Audio language models (ALMs) are increasingly used for speech-based understanding, yet their ability to perform semantic reasoning beyond transcription, Text-to-Audio Retrieval, Captioning, and Question-Answering accuracy remains insufficiently benchmarked. In particular, the effects of accent variation, domain shift, and semantic over-inference on audio reasoning are poorly understood. We evaluate audio language models across five semantic and paralinguistic reasoning tasks: entailment, consistency, plausibility, accent drift, and accent restraint. Collectively, these tasks assess a model's ability to reason over spoken audio as the primary evidence source, including whether a textual hypothesis can be inferred, contradicted, or left undetermined by the audio, whether statements align or conflict with spoken content, whether claims are plausible given the discourse, and whether model predictions remain stable or appropriately constrained across accent variation. These findings highlight critical limitations in current audio reasoning evaluations and hope to provide guidance for more robust and equitable ALM design and assessment

12.
arXiv (quant-ph) 2026-06-16

Adiabatically-induced Kawaguchi geometry and jerk in quantum-classical systems

arXiv:2606.16037v1 Announce Type: new Abstract: Adiabatically eliminating the quantum degrees of freedom in a mixed quantum-classical system produces an effective force in the classical equation of motion. The elimination can be made to any order in the adiabatic parameter, generating a series of higher order forces. By applying a sequence of near-identity unitary transformations to the quantum state, we derive a hierarchy of increasingly accurate effective actions for the classical variables. The third order Euler-Lagrange equation is non-Newtonian as the force depends on the jerk, the third order time derivative of position. We find that the third order terms induce a special kind of Kawaguchi geometry on the space of classical variables. This geometry is characterized by an almost symplectic structure and a differential line element that depends on the acceleration in addition to the velocity. Our results can be used to efficiently capture higher order nonadiabatic effects in molecular dynamics simulations.

13.
arXiv (CS.CL) 2026-06-19

CzechDocs: A Multiway Parallel Dataset of Formatted Documents for Minority Languages in Czechia

We present CzechDocs, a multiway parallel dataset of formatted documents (HTML, DOCX, and PDF) covering Czech and minority languages used in Czechia-primarily Ukrainian and English, with smaller portions of Vietnamese, Russian and other languages. The dataset is designed to support the evaluation of machine translation systems that aim to preserve document formatting during translation. We provide a comparison of the most common approaches to format-preserving machine translation on a validation subset of the dataset. This validation split, together with the evaluation toolkit, is publicly released for further research. A held-out test split will be reserved for a future shared task focused on document-level translation with formatting preservation.

14.
arXiv (CS.CL) 2026-06-11

CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing

Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking checkpoints, these capabilities are complementary but misaligned. The Instruct model is concise and tool-disciplined, whereas the Thinking model offers stronger planning and recovery behavior but often over-deliberates and degrades agent performance. We present CRANE (Constrained Reasoning Injection for Code Agents via Nullspace Editing), a training-free parameter-editing method that treats the Thinking-Instruct delta as a directional pool of candidate reasoning edits for the Instruct backbone. CRANE combines magnitude thresholding to denoise the delta, a Conservative Taylor Gate to retain edits that are jointly beneficial for reasoning transfer and tool-use preservation, and Graduated Sigmoidal Projection to suppress format-critical update directions. By merging paired Instruct and Thinking checkpoints, CRANE delivers strong gains over either individual model while preserving Instruct-level efficiency: on Roo-Eval it achieves pass1 of 66.2% (+19.5%) for Qwen3-30B-A3B and 81.5% (+8.7%) for Qwen3-Next-80B-A3B; on SWE-bench-Verified it resolves up to 14 additional instances at both scales (122/500 and 180/500); and on Terminal-Bench v2 it improves pass1/pass5 by up to 2.3%/7.8%, reaching 7.6%/17.9% and 14.8%/30.3%, respectively, consistently outperforming alternative merging strategies across all three benchmarks.

15.
arXiv (CS.CV) 2026-06-17

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design – tokenization, attention, conditioning, objectives, and latent autoencoders – has been extensively revisited. The residual stream that governs how information accumulates across layers, however, has been directly inherited from the original Transformer. In this paper, we present a systematic empirical analysis of cross-layer information flow in DiTs, jointly along depth and denoising timestep, and identify three concrete symptoms of traditional residual addition, namely monotonic forward magnitude inflation, sharp backward gradient decay, and pronounced block-wise redundancy. Motivated by this diagnosis, we propose Diffusion-Adaptive Routing (\textsc{DAR}), a drop-in residual replacement that performs learnable, timestep-adaptive, and non-incremental aggregation over the history of sublayer outputs. Moreover, the proposed \textsc{DAR} is compatible with many modern Transformer enhancement methods, such as REPA. On ImageNet $256\times256$, \textsc{DAR} improves SiT-XL/2 by $2.11$ FID ($7.56$ vs.\ $9.67$) and matches the baseline's converged quality with $8.75\times$ fewer training iterations. Stacked on top of REPA, it yields a $2\times$ training acceleration in the early stage, suggesting cross-layer information routing as an underexplored design axis in diffusion modeling, one that operates orthogonally to existing representation-alignment objectives. Beyond pretraining, \textsc{DAR} can also be applied during the fine-tuning stage of large-scale T2I models and preserves high-frequency details during Distribution Matching Distillation.

16.
arXiv (CS.AI) 2026-06-17

AnalogFed: Privacy-Preserving Discovery of Analog Circuits at Scale with Federated Generative AI

arXiv:2507.15104v2 Announce Type: replace-cross Abstract: Recent advances in generative AI (GenAI) have shown transformative potential for modern hardware design. However, existing GenAI-driven approaches fall short of enabling large-scale electronic design automation (EDA) due to the proprietary and siloed nature of hardware datasets, which cannot be centralized for model training. Achieving at-scale GenAI-driven EDA, therefore, requires a novel privacy-preserving framework that can leverage distributed data without compromising confidentiality. This work introduces AnalogFed, the first privacy-preserving framework for large-scale analog circuit topology discovery using federated learning (FedL) and GenAI. AnalogFed establishes the feasibility of collaborative analog topology design while addressing key security challenges: it mitigates membership inference attacks (MIAs) through a novel input perturbation strategy based on dummy token injection, and defends against model inversion attacks with customized, efficient homomorphic encryption. Extensive experiments demonstrate AnalogFed's effectiveness and efficiency, achieving strong privacy protection without degrading model utility. This framework lays the foundation for scalable, multi-party collaboration in next-generation hardware design automation with GenAI.

17.
arXiv (CS.CL) 2026-06-16

Calibrated Triage, Not Autonomy: Confidence Estimation for Medical Vision-Language Models

A vision-language model can answer a question about a medical image fluently and confidently while barely using the image, leaning instead on language priors. In medicine this is the failure that matters most, because the answer looks trustworthy and is not, and the only protection is a confidence score reliable enough to tell the system when to abstain. We ask a deployment question rather than an accuracy one: how much imaging work a model can safely handle alone, and which confidence signal makes that possible. We evaluate seven confidence estimators across five open-weight LVLMs and three medical visual-question-answering datasets spanning broad clinical imaging, radiology, and pathology, with every probe trained only on natural images and applied without adaptation. Recast as bounded selective prediction (automate a case only when confidence clears a threshold, defer the rest), the comparison is cautionary. The standard metrics are poor guides: discrimination barely separates the methods, and the weak calibration of a cheap self-report is cheaply removed by off-domain temperature scaling without changing deployable yield. What distinguishes a usable estimator is the high-confidence region a clinician acts on: the weakest baselines are confidently wrong on 41 to 45 percent of their errors against 1 to 4 percent for the best probe, and no estimator is reliably best across domains or models. Safe handoff is governed at two levels: base-model competence sets a ceiling, so a well-calibrated score recovers roughly a third of radiology cases at a 20 percent error tolerance but almost none of pathology; the confidence layer then decides how much of that ceiling is reachable. The usable role today is calibrated triage, not autonomy: automate the cases a calibrated score marks safe, route the rest to a clinician. We release all outputs, correctness judgments, and confidence scores, with code.

18.
arXiv (quant-ph) 2026-06-19

Frequency-Multiplexed Millimeter-Wave Fault-Tolerant Superconducting Qubits Enabled by an On-Chip Nonreciprocal Control Bus

arXiv:2512.17588v2 Announce Type: replace Abstract: Scaling superconducting quantum processors is fundamentally limited by the escalating complexity of cryogenic wiring and the detrimental effects of microwave crosstalk and Purcell decay. This paper proposes a novel architecture based on frequency-multiplexed millimeter-wave superconducting qubits, integrating an on-chip cryogenic nonreciprocal space-time-periodic Josephson frequency multiplier as a universal control bus. The bus replaces multiple high-frequency XY drive lines with a single low-frequency input tone, which is parametrically converted into a comb of high-order harmonics, each resonantly addressing a distinct qubit. The nonreciprocal nature of the bus provides intrinsic isolation that suppresses Purcell decay and reduces coherent crosstalk by more than $98\%$ compared to a conventional reciprocal shared drive line. Full error-budget analysis demonstrates that the architecture can maintain gate errors below the fault-tolerance threshold for arrays exceeding 25 qubits, converting a crosstalk-dominated error budget into one primarily limited by intrinsic material coherence. Theoretical modeling based on a non-Markovian master equation further indicates that the engineered environment enables information backflow, offering a pathway to enhanced coherence. This integrated, frequency-multiplexed, and nonreciprocal control bus offers a compelling route toward dramatic I/O simplification, improved noise resilience, and scalable high-coherence superconducting quantum processors.

19.
arXiv (CS.LG) 2026-06-15

How Task Structure Limits Multi-Agent Success: An Information-Theoretic Analysis

arXiv:2606.13733v1 Announce Type: cross Abstract: Multi-agent systems (MAS) were expected to overcome the limitation of single-agent systems (SAS) through collaboration. However, under typicality conditions on the task's constraint graph and bounded inter-agent communication, we prove that the success probability of a MAS is closely tied to the connectivity of task constraints, where each agent has limited information-processing capacity. Specifically, the success probability decays exponentially with an information bottleneck that emerges from partitioning the task's constraint graph among agents. We define this quantity as the minimum cut cost $C_{\min}$ of the potential constraint graph of each task. This information-theoretic bound applies to both open systems with external feedback and closed systems without. We validate our theory on both synthetic experiments and real-world empirical data from SWE-bench submissions. From our framework, effective MAS design should incorporate task-inherent constraints alongside engineering optimization, and when $\Cmin$ is high, practitioners should restructure tasks rather than simply scaling agents or communication.

20.
arXiv (CS.LG) 2026-06-11

HAMNO: A Hierarchical Adaptive Multi-scale Neural Operator with Physics-Informed Learning for Dynamical Systems

arXiv:2606.11963v1 Announce Type: new Abstract: Neural operators provide a powerful framework for learning solution mappings of partial differential equations directly in function space. However, many existing architectures still struggle to represent nonlinear time-dependent systems that involve multi-scale structures, long-range interactions, and stable long-time evolution. In this work, we introduce the Hierarchical Adaptive Multi-scale Neural Operator (HAMNO), a neural-operator architecture that combines local convolutional representations, global spectral operators, and hierarchical encoder-decoder processing. The central component of HAMNO is a data-dependent gating mechanism that adaptively balances local and global information at each spatial location, allowing the model to resolve fine-scale features while preserving long-range dependencies. We further develop a physics-informed extension, PI-HAMNO, based on a multi-objective loss strategy that combines data fitting with strong- and weak-form physics constraints. The strong-form term penalizes the domain-integrated squared PDE residual in physical coordinates, while the weak-form term is constructed by multiplying the governing residual by finite-element test functions and evaluating the resulting element integrals using centroid-based tetrahedral quadrature. The framework is evaluated on non-periodic Allen-Cahn (AC), Cahn-Hilliard (CH), and Swift-Hohenberg (SH) equations defined on cubic domains. Across long-horizon rollout, data-limited training, out-of-distribution initial-condition shifts, and random-seed variations, HAMNO improves predictive accuracy over standard neural-operator baselines, while PI-HAMNO further enhances stability, physical consistency, and data efficiency. The implementation is publicly available at https://github.com/MBamdad/HAMNO .

21.
arXiv (CS.CV) 2026-06-19

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalability is limited by high collection cost, acquisition difficulty, and low behavioral and environmental diversity. These limitations have sparked interest in egocentric human video as a scalable, substantially lower-cost, and more diverse alternative for embodied model pretraining. However, its effectiveness compared to teleoperated real-robot data remains underexplored. To address this question, we conduct a systematic study comparing egocentric human video and teleoperated real-robot trajectories as pretraining data sources for embodied foundation models, under fixed post-training and validation protocols. Surprisingly, we find that egocentric data, when processed through a carefully designed filtering and labeling pipeline, is not merely a viable substitute for model pretraining but can lead to superior performance. With the same amount of pretraining data, models pretrained on egocentric data achieve a 24% lower validation loss on real-robot action prediction, as well as 52.5% and 90% higher success rates on in-distribution and out-of-distribution real-robot task execution, respectively. This finding verifies a scalable paradigm for embodied foundation models: pretrain on egocentric human video to learn diverse world representations, then adapt with a small amount of labeled real-robot data for action-space alignment. We hope this study encourages broader exploration of egocentric data and offers guidance for data quality assessment before costly robot data collection.

22.
arXiv (CS.AI) 2026-06-19

Measuring Biological Capabilities and Risks of AI Agents

arXiv:2606.19899v1 Announce Type: cross Abstract: This paper addresses a rapidly emerging policy challenge: how to generate and interpret credible evidence about the biological capabilities and risks of AI scientists, or agentic AI systems capable of autonomously or collaboratively performing multi-step scientific tasks. As these systems enter real research workflows, decision-makers increasingly face evaluation results whose meaning depends on underlying design choices that are often implicit or under-documented. We synthesize current evidence on AI-enabled biological risks and introduce biological agentic evaluations as a promising, but interpretation-sensitive, tool for assessing these systems. Our central contribution is a set of practical, experience-grounded considerations – drawing from our own evaluations – that show how choices around defining, designing, running, scoring, and documenting evaluations materially shape what results do and do not imply about risk. The analysis is intended to help policymakers interpret biological evaluation outputs with appropriate caution; guide public and private funders toward high-leverage investments in AI-biology evaluation research; and support biosecurity practitioners assessing emerging AI systems. A secondary audience includes researchers designing or conducting agentic evaluations within frontier AI labs, AI providers, scientific institutions, and third-party evaluation organizations.

23.
PLOS Computational Biology 2026-06-17

Machine learning-driven identification of virulence determinants in <i>Borrelia burgdorferi</i> associated with human dissemination

by Hoa Thanh Nguyen, Catherine A. Brissette Lyme disease, the most common tick-borne infectious disease in the United States, presents with highly variable clinical outcomes, ranging from localized erythema migrans to severe disseminated complications affecting the heart, joints, and nervous system. The bacterial determinants underlying this phenotypic variation remain largely unknown, limiting our ability to predict disease progression and optimize treatment strategies. Here, we applied machine learning (ML) approaches to identify specific amino acid residues within surface-exposed virulence factors that predict human dissemination phenotypes. Utilizing the published whole genome sequences from 299 clinical Borrelia burgdorferi isolates collected from the United States and Slovenia over a 30-year period (1992–2021), we extracted and characterized translated amino acid sequences (variants) of seven known virulence factors (BB_0406, BBK32, DbpA, OspA, OspC, P66, and RevA). Protein variants were classified based on their association with disseminated versus localized infections using clinical metadata. Cramér’s V analysis revealed possible strong associations between dissemination phenotypes and five adhesins: BBK32, DbpA, OspC, P66, and RevA. We developed ML models using five algorithms with multiple feature selection strategies, achieving robust predictive performance for DbpA, OspC, and RevA variants (all performance metrics > 0.7). Feature importance analysis identified 57, 29, and 42 key predictive residues for DbpA, OspC, and RevA, respectively. Notably, B-cell epitope prediction revealed significant enrichment of ML-identified residues within predicted epitope regions for OspC (11 overlapping residues, OR = 3.57, p = 0.006) and RevA (12 overlapping residues, OR = 2.37, p = 0.048), suggesting these residues may influence immune recognition and bacterial persistence. This study establishes the first computational framework linking Borrelia protein sequence variants to clinical dissemination phenotypes, providing molecular insights into Lyme disease pathogenesis that may inform the development of improved diagnostics and therapeutic targets.

24.
arXiv (CS.CL) 2026-06-16

Misinformation Propagation in Benign Multi-Agent Systems

Multi-agent systems, in which multiple large language model agents solve problems through turn-based interaction, are increasingly deployed in high-stakes settings such as medical diagnosis, legal analysis, and forensic decision-making. Their reliability can be at risk when single agents reason from incorrect or misleading context, e.g., from tool calls, since errors may propagate through agent interactions. This work studies this risk by injecting intent-based misinformation into benign single-agent and multi-agent systems across reasoning, knowledge, and alignment tasks. We find that misinformation can degrade single-agent performance and persists across multi-agent debate, with agents often retaining answers introduced by misinformed peers. Nevertheless, multi-agent debate reduces the resulting performance degradation compared to single-agent prompting, especially when most agents are not exposed to misinformation. Robustness depends on group composition and decision protocol. Consensus can be more stable than voting under peer pressure, while majorities can often steer misinformed agents back toward correct answers. Our results show that misinformation robustness in multi-agent systems depends on the underlying model and also on how agents exchange information and aggregate decisions.

25.
arXiv (CS.AI) 2026-06-11

GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning

arXiv:2510.04567v3 Announce Type: replace-cross Abstract: Graph Neural Networks (GNNs) are powerful tools for processing relational data but often struggle to generalize to unseen graphs, giving rise to the development of Graph Foundational Models (GFMs). However, current GFMs are challenged by the extreme heterogeneity of graph data, where each graph can possess a unique feature space, label set, and topology. To address this, two main paradigms have emerged. The first leverages Large Language Models (LLMs), but is fundamentally text-dependent, thus struggles to handle the numerical features in vast graphs. The second pre-trains a structure-based model, but the adaptation to new tasks typically requires a costly, per-graph tuning stage, creating a critical efficiency bottleneck. In this work, we move beyond these limitations and introduce Graph In-context Learning Transformer (GILT), a framework built on an LLM-free and tuning-free architecture. GILT introduces a novel token-based framework for in-context learning (ICL) on graphs, reframing classification tasks spanning node, edge and graph levels in a unified framework. This mechanism is the key to handling heterogeneity, as it is designed to operate on generic numerical features. Further, its ability to understand class semantics dynamically from the context enables tuning-free adaptation. Comprehensive experiments show that GILT achieves stronger few-shot performance with significantly less time than LLM-based or tuning-based baselines, validating the effectiveness of our approach. Our code is available at: https://github.com/yiming421/inductnode/.