Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

TUNI: Unifying Pre-training and Fine-tuning with Modality-Aware Mutual Learning and Rectification for RGB-T Semantic Segmentation

RGB-thermal (RGB-T) semantic segmentation improves the environmental perception of autonomous platforms in challenging conditions. Prevailing RGB-T segmentation frameworks suffer from suboptimal multi-modal feature extraction and fusion, unbalanced modality dependency, and inadequate utilization of thermal information. To address these challenges, we propose TUNI, a unified pre-training and fine-tuning framework for efficient and real-time RGB-T semantic segmentation. It pre-trains an RGB-T encoder that incorporates an RGB-T local module that selectively emphasizes salient consistent and distinct local features across modalities, thereby integrating cross-modal feature extraction and fusion in a unified manner. To alleviate the modality bias issue during RGB-T pre-training, modality-inverted contrastive mutual learning is introduced to enable knowledge exchange between two RGB-dominated and thermal-dominated encoders. In the fine-tuning phase, modality rectification learning fully exploits residual thermal information by focusing on correct yet divergent prediction regions between two modality-specific decoders. We further develop three TUNI variants, covering lightweight, balanced, and high-performance requirements. Extensive experiments on five RGB-T semantic segmentation datasets demonstrate that TUNI achieves superior accuracy, generalization, and compactness compared with 15 state-of-the-art models. The code is available at https://github.com/xiaodonguo/TUNI-v2.

02.
arXiv (CS.LG) 2026-06-11

Online Learning for Supervisory Switching Control

arXiv:2603.14762v4 Announce Type: replace-cross Abstract: We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy a suitable controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to a control-theoretic setting. The proposed data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of state history, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the matching controller in $O(N \log^2 N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.

03.
arXiv (quant-ph) 2026-06-11

Large Fluctuations in Open Quantum Systems

arXiv:2606.11822v1 Announce Type: new Abstract: We study statistics of atypical measurement outcomes in the steady states of driven open quantum systems. In equilibrium, the probability distribution over the phase space, as encoded in, e.g., the Wigner function, is analytic in the phase-space coordinates. We show that this property is generically lost in driven dissipative systems: their {\it large-deviation function} develops lines and surfaces across which its derivatives are discontinuous. As an illustrative example, we consider a parametrically driven Kerr oscillator coupled linearly and/or nonlinearly to a dissipative bath. Rare fluctuations in the amplitude and phase of the induced oscillations are governed by semiclassical instanton trajectories of the corresponding Keldysh-Lindblad action. We demonstrate that a given fluctuation can be realized through multiple distinct instanton trajectories. The competition between these trajectories leads to abrupt switching of the dominant instanton and, consequently, to non-analytic features in the large-deviation function.

04.
arXiv (CS.LG) 2026-06-16

Priority-Aware Shapley Value

arXiv:2602.09326v2 Announce Type: replace Abstract: Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributors are dependent (e.g., reused/augmented data or causal feature orderings) or when contributions should be adjusted by factors such as trust or risk. We propose Priority-Aware Shapley Value (PASV), which incorporates both hard precedence constraints and soft, contributor-specific priority weights. PASV is applicable to general precedence structures, recovers precedence-only and weight-only Shapley variants as special cases, and is uniquely characterized by natural axioms. We develop an efficient adjacent-swap Metropolis-Hastings sampler for scalable Monte Carlo estimation and analyze limiting regimes induced by extreme priority weights. Experiments on data valuation (MNIST/CIFAR10) and feature attribution (Census Income) demonstrate more structure-faithful allocations and a practical sensitivity analysis via our proposed "priority sweeping".

05.
arXiv (CS.AI) 2026-06-11

Preregistration for Experiments with AI Agents

arXiv:2606.11217v1 Announce Type: cross Abstract: The proliferation of large language models (LLMs) and autonomous AI agents has given rise to a rapidly growing methodological paradigm: "in silico" behavioral experiments. Originally conceived as a way to use AI agents as proxies for human participants in studies of cognition, decision-making, and social dynamics, this approach has taken on new significance – as AI agents increasingly negotiate, transact, and make consequential decisions on behalf of people and organizations, understanding their behavior has become a research priority in its own right. While these experiments with AI agents offer unprecedented advantages in terms of scalability, cost efficiency, and experimental control, they also inherit, and in some cases amplify, methodological vulnerabilities that have long plagued human subjects research. To address these issues, this paper argues that preregistration practices – central to improving the credibility of human subjects experiments – should now be extended to experiments with AI agents. We systematically catalog the researcher degrees of freedom that experiments with AI agents introduce – model selection, prompt wording, settings, and outcome-contingent redesign, for example – and show how the low cost of iteration and lack of reporting norms make these choices both easy to exploit and difficult to detect. We propose a preregistration template tailored to experiments with AI agents and call on conferences, journals, and funding agencies to make preregistration standard practice for this emerging research paradigm.

06.
arXiv (quant-ph) 2026-06-15

Certification of the genuine resolution of photon number resolving detectors

arXiv:2606.14365v1 Announce Type: new Abstract: Photon-number-resolving (PNR) detectors are essential components of photonic quantum technologies, yet thus far, no practical metric exists to certify how many photons they can genuinely resolve in a single measurement. Here we introduce an operational framework for quantifying the capability of a PNR detector to distinguish between different numbers of photons, i.e. its genuine resolution. In turn, we develop a practical and scalable protocol for certifying the genuine resolution of a detector, which is based on coherent state probes. We apply the method to a 28-pixel photon-number-resolving superconducting nanowire single-photon detector (PNR-SNSPD) and certify genuine four-outcome resolution. Our work highlights the critical requirements in terms of detector efficiency towards achieving high genuine resolution. This approach provides an operational benchmark for PNR detectors and fills a crucial gap in the characterization of photonic quantum devices.

07.
arXiv (CS.LG) 2026-06-19

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

arXiv:2605.31158v3 Announce Type: replace-cross Abstract: Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework for interactive video world models. Our key insight is that interaction naturally enables trajectory-dependent adaptive computation: retrieved spatial memory can be discarded during novel exploration, temporal context can be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management, denoising cache acceleration, and hardware-software co-designed 3D block sparse attention with fused Triton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.

08.
arXiv (CS.CL) 2026-06-11

Evaluating Factual Density in Multi-Source RAG: A Study in Medical AI Accuracy

Retrieval-Augmented Generation (RAG) is the current industry standard for grounding AI in real-world facts. Traditional retrieval methods rely on keyword matching and topic proximity, ranking content based on how closely it sounds like the user's query. What they do not measure is how many verified facts the content actually contains. This structural gap, termed the Expert Blindness Effect, causes standard RAG pipelines to consistently bury high-density factual evidence in favor of lexically dominant text on the same topic. To address this gap, this paper introduces Factual Density (FD*), a novel retrieval optimization signal that measures the proportion of verified atomic claims relative to total token count. Using the NexusAgentics Ghost Audit preprocessing pipeline, raw text is scored for factual specificity using probabilistic factuality analysis to filter content before corpus ingestion. An initial formulation introduced a severe document-length confound (Pearson R = -0.8636, p = 2.27e-07). Implementing Z-score normalization within length bins resolved this bias, validating FD* as a length-independent density signal (p = 0.0749). Evaluated against the HealthFC benchmark (750 health claims labeled Supported, Refuted, or No Evidence by medical experts), FD*-optimized retrieval was the only condition to achieve 100% systematic review saturation in top-5 results, surfacing Cochrane evidence that standard cosine similarity ranked outside the top ten. Ground truth verification confirmed 25 mappings across seven HealthFC-supported claims. While full statistical validation across n=50 queries remains future work due to constraints on corpus-benchmark alignment, these findings establish factual density reranking as a low-cost, high-impact intervention for improving factual precision in health RAG architectures.

09.
arXiv (math.PR) 2026-06-12

Fourier Dimensions of Mandelbrot Cascades under Minimal Integrability

作者:

arXiv:2606.08703v2 Announce Type: replace Abstract: This note announces exact Fourier dimension formulas for canonical Mandelbrot cascade measures under the minimal Kahane Peyriere integrability condition and records the canonical b adic extension on cubes. In the dyadic interval setting, the theorem is proved in a balanced vector weight model allowing dependence between sibling weights. Almost surely on non extinction, the Fourier, energy, and L2 dimensions all equal the energy exponent. The scalar specialization gives the canonical Mandelbrot Kahane Fourier dimension formula under the minimal integrability condition. On the circle, the endpoint formula is given by the endpoint lower local dimension exponent. For the b adic Mandelbrot cascade on cubes, the Fourier dimension is the minimum of 2 and the energy exponent, with the universal Fourier barrier at dimension two providing the high dimensional obstruction.

10.
medRxiv (Medicine) 2026-06-11

Malaria Risk among Internally Mobile Individuals and Heterogeneous Mobility Patterns in Two Hypoendemic Communities: Implications for Malaria Elimination in the Peruvian Amazon.

Background: Human mobility is increasingly recognized as a key factor influencing malaria transmission dynamics, particularly in low-transmission settings approaching elimination. This study aimed to assess mobility patterns and their association with malaria risk in two hypoendemic communities in the Peruvian Amazon. Method: A longitudinal study was conducted in the communities of Libertad and Urcomirano (Mazan River basin). Monthly population screenings were combined with weekly active and passive case detection. A total of 678 individuals were enrolled. Mobility patterns were assessed through structured questionnaires, and social network analysis was used to characterize travel connections. Log-binomial regression analysis was applied to identify risk factors associated with malaria infection. Result: Internally, mobile individuals in Libertad showed a higher malaria incidence (>32.47 cases per 1,000 person-months) than those in Urcomirano (

11.
arXiv (CS.CL) 2026-06-16

CoCoGEC: Counterfactual Generation for Robust Grammatical Error Correction

Grammatical error correction (GEC) systems are usually trained and evaluated on GEC benchmarks, but their performance often drops sharply once the surrounding context is slightly perturbed or extended. This indicates that the existing GEC models usually fail to understand the error patterns in the varying contexts. In this paper, we thoroughly investigate the counterfactuals for GEC tasks, where the subtle changes to the contexts could lead to the label flipping issue. We propose CoCoGEC, a counterfactual generation framework that creates copies of training instances with error-irrelevant contexts altered. Our framework systematically generates counterfactuals by (1) generating intra- and inter-sentence counterfactuals that maintain the error patterns as well as syntax of the original instances by altering the word-level and sentence-level contexts; (2) revising the generated counterfactuals by selecting the instances with flipped labels and high GEC Mutual Information (MI) coefficient. Extensive experiments show that our method substantially improves the stability of GEC models, outperforming a set of data augmentation baselines. Particularly, it could achieve absolute F0.5 gains of +9.9, +11.3, and +20.8 points on the perturbed BEA-19*,CoNLL-14*, and TEM-8* data set.Our code is released at https://github.com/Quinnok/CoCoGEC

12.
Nature (Science) 2026-06-10

Molecular glue degraders of HuR suppress BRAF-mutant colorectal cancer

作者:

BRAF gain-of-function mutations, particularly BRAF(V600E), affect roughly 10% of all patients with colorectal cancer (CRC), and portend poor prognosis with limited therapeutic interventions. BRAF inhibitors such as encorafenib are ineffective due to MAPK pathway reactivation driven by BRAF dimerization. Combined inhibition of BRAF and EGFR, although approved therapies, results in short survival benefits and frequent treatment resistance and relapse1–3. Here, through rational chemical library design coupled with parallel proteomic screening, we identified dHuR as a molecular glue degrader of human antigen R (HuR), an RNA-binding protein that drives tumour growth, invasion and therapy resistance. dHuR binds to the CRBN ubiquitin ligase to create a unique benzofuran-tethered composite surface to recruit HuR as a neosubstrate by engaging its β-hairpin G-loop degron, as revealed by the cryo-electron microscopy structure of the ternary complex. dHuR abrogated BRAF expression by inducing its exon 18 skipping, and demonstrated superior suppression of BRAF-mutant CRC tumours including those gaining resistance to BRAF inhibitors. Finally, we performed kinome library CRISPR screening and revealed that inactivation of EGFR or MEK enhanced dHuR cytotoxicity, thus establishing a combinatorial strategy to treat patients with refractory BRAF-mutant CRC. Molecular glue degraders of the RNA-binding protein HuR have therapeutic potential for BRAF-mutant cancers.

13.
arXiv (CS.AI) 2026-06-19

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

arXiv:2506.14990v3 Announce Type: replace Abstract: Benchmarks play a central role in reinforcement learning (RL) research, yet their computational constraints often shape what is studied. Despite the motivation of lifelong learning, most continual RL papers consider only 3-10 sequential tasks, as CPU-bound environments make longer sequences impractical. Meanwhile, continual learning in cooperative multi-agent settings remains largely unexplored. To address these gaps, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark for continual multi-agent RL. By leveraging JAX and GPU acceleration, MEAL enables training on sequences of 100 tasks in a few hours on a single GPU. We find that long task sequences reveal failure modes that do not appear at smaller scales.

14.
arXiv (CS.LG) 2026-06-11

Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations

arXiv:2605.04853v2 Announce Type: replace Abstract: We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear dispersive PDEs, while a lightweight neural network, operating on a low-dimensional latent manifold, learns the residual defect that analytical methods cannot close. An explicit time-step scaling on the neural correction ensures that its Lipschitz contribution remains $\mathcal{O}(\tau)$, yielding a Gronwall stability factor bounded uniformly in the step size and independent of the spatial resolution. The network is trained end-to-end through a solver-in-the-loop objective that unrolls the full iteration and penalises trajectory error in a Bourgain-type norm, aligning learning with multi-step solver dynamics rather than isolated one-step targets. Under stated assumptions, the global error satisfies $C(\varepsilon_{net}+\delta)\,\tau^\gamma\ln(1/\tau)$, where $\varepsilon_{net}$ measures the network approximation quality and $\delta$ the training shortfall. Experiments on three dispersive benchmarks with rough data show that HIN-LRI improves accuracy over analytical integrators, splitting methods, and neural PDE surrogates, with stable spatial refinement, effective out-of-distribution transfer, and modest online overhead.

15.
arXiv (quant-ph) 2026-06-12

Toward Entanglement Bootstrap for Conformal Field Theory in Any Dimension

arXiv:2606.12540v1 Announce Type: cross Abstract: Given a quantum critical wavefunction in any dimension, we propose a reconstructed Hamiltonian, analogous to the ones previously found for 1+1d CFT and for 2+1d bosonic liquid topologically-ordered states. We test numerically that, for known regularized approximate CFT groundstates (on the icosahedron and the fuzzy sphere), (1) they are close to the groundstate of their reconstructed Hamiltonian, and (2) the spectrum of their reconstructed Hamiltonian on the unit sphere has CFT properties (integer spacing of descendants) and matches known low-lying energies. We show that this provides an automated method to improve the finite-size effects in a fixed Hilbert space.

16.
arXiv (CS.LG) 2026-06-18

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

arXiv:2606.19140v1 Announce Type: new Abstract: Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival models have improved predictive performance over classical statistical approaches, existing methods typically rely on static fusion strategies or temporally agnostic modeling, limiting their ability to capture structured clinical workflows. In this work, we propose ChronoSurv, a heterogeneous hierarchical directed graph framework for multimodal survival analysis. ChronoSurv represents patient care as a progression-aware clinical trajectory using directed graphs aligned with key diagnostic steps. A hierarchical topology incorporates fine-grained, coarse, and global representations, further supporting flexible adaptation to missing modalities, while heterogeneous message passing models complex and asymmetric relationships across modalities and clinical steps. Experimental results on two public datasets demonstrate that ChronoSurv achieves state-of-the-art discriminative performance while maintaining statistically reliable calibration. Comprehensive ablation studies further confirm the contribution of each architectural component, highlighting the potential of trajectory-aware graph modeling for multimodal survival prediction.

17.
arXiv (CS.AI) 2026-06-16

Trust-Region Diffusion Policies for Massively Parallel On-Policy RL

arXiv:2606.15260v1 Announce Type: cross Abstract: Reinforcement learning with massively parallel simulations has become a standard framework for developing robust, deployable policies; however, most existing approaches still rely on simple Gaussian policy parameterizations. Diffusion models provide a more expressive policy class and have shown strong performance on challenging control problems, yet most diffusion-based RL methods are designed for offline or off-policy training. In this work, we ask whether diffusion policies can be trained effectively in the massively parallel, on-policy regime. To this end, we introduce Trust-region Diffusion Policies (TruDi), which enables diffusion policies for on-policy RL with massively parallel simulations. This setting is particularly challenging because the data distribution changes quickly across updates, making stable training with complex policies difficult. TruDi addresses this by integrating a trust-region optimization rule to enforce a KL-divergence constraint over the entire diffusion trajectory. Empirically, we evaluate TruDi on a diverse set of 4 massively parallel RL benchmarks comprising a total of 73 tasks. Across these tasks, TruDi consistently outperforms or is on-par with strong baselines on standard tasks and achieves clear gains on more challenging humanoid control tasks, establishing a strong new baseline for massively parallel on-policy RL.

18.
arXiv (quant-ph) 2026-06-11

Implementing Hamiltonian Renormalization Group Flow on Quantum Computers with VAPOR

arXiv:2606.11306v1 Announce Type: cross Abstract: While Hamiltonian Lattice Gauge Theory is gaining traction, today's limited numerical capacity leaves simulations affected by discretization errors. This motivates the implementation of renormalization group (RG) techniques to find discretization-error-free operators. To this end, we introduce VAPOR, a variational quantum algorithm that decomposes operators into Pauli strings, identifies RG flow orbits, and determines fixed points of a naively discretized operator. We illustrate this using a toy model of a kinematic operator in a symmetry-restricted SU(2) Yang-Mills theory.

19.
arXiv (CS.CL) 2026-06-17

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.

20.
arXiv (CS.CV) 2026-06-17

A Quantitative Analysis of Multimodal Biomarkers in Alzheimer's Disease

Despite increasing adoption of multimodal approaches in Alzheimer's Disease (AD) research – aimed at integrating molecular, structural, clinical, and genetic biomarkers to enhance disease characterization – the relationships among these modalities remain poorly understood. A systematic analysis of their dynamic interaction is essential for improving disease modeling, identifying redundant assessments, and reducing patient burden and acquisition costs. In this paper, we present a quantitative analysis of multimodal AD biomarkers by integrating tau-PET, structural MRI, cognitive scores (MMSE and CDR), and APOE4 data from 789 subjects drawn from the ADNI dataset. In our analyses, we (A) quantify cross-modal mutual information and explained variance to assess redundancy and predictive dependencies; (B) examine associations between tau topologies and structural atrophy across brain regions to select informative ROIs; (C) perform a statistical decomposition of the tau-cognition association into atrophy-related and atrophy-independent components; (D) and identify a dominant neurodegenerative trajectory that aligns with cognitive decline. This study provides a systematic characterization of cross-modal relationships, improving the interpretability and selection of biomarkers in AD. Code is publicly available at: https://github.com/antonioscardace/Multimodal-AD.

21.
arXiv (CS.AI) 2026-06-16

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

arXiv:2602.09222v2 Announce Type: replace-cross Abstract: Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and performing actions on users' behalf. While these agents offer powerful capabilities, their design exposes them to indirect prompt injection attacks embedded in untrusted web content, enabling adversaries to hijack agent behavior and violate user intent. Despite growing awareness of this threat, existing evaluations rely on fixed attack templates, manually selected injection surfaces, or narrowly scoped scenarios, limiting their ability to capture realistic, adaptive attacks encountered in practice. We present MUZZLE, an automated agentic framework for evaluating the security of web agents against indirect prompt injection attacks. MUZZLE utilizes the agent's trajectories to automatically identify high-salience injection surfaces, and adaptively generate context-aware malicious instructions that target violations of confidentiality, integrity, and availability. Unlike prior approaches, MUZZLE adapts its attack strategy based on the agent's observed execution trajectory and iteratively refines attacks using feedback from failed executions. We evaluate MUZZLE across diverse web applications, user tasks, and agent configurations, demonstrating its ability to automatically and adaptively assess the security of web agents with minimal human intervention. Our results show that MUZZLE effectively discovers 44 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties across different LLMs and agent scaffolds. MUZZLE also identifies novel attack strategies, including 3 cross-application prompt injection attacks and an agent-tailored phishing scenario.

22.
arXiv (CS.AI) 2026-06-17

Dissecting model behavior through agent trajectories

arXiv:2606.17454v1 Announce Type: new Abstract: AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we $reproduce or improve on the pass@1$ performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an $analysis of 138k trajectories generated by SSA$, we look beyond the $\texttt{pass@1}$ numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.

23.
Science (Express) 2026-05-07

Induction of broadly neutralizing HIV antibodies by a two-step mechanism informs vaccine design | Science

作者: 未知作者

A major obstacle confronting HIV-1 vaccine and cure research is the lack of an outbred animal model for rapid and consistent induction of broadly neutralizing antibodies (bNAbs). We designed an epitope-focused simian-human immunodeficiency virus (SHIV.5MUT) that elicited broad and potent V3-glycan-targeted antibodies within a year of infection in 14 of 22 macaques compared with 0 of 14 control animals. SHIV.5MUT elicited bNAbs by a two-step mechanism, inducing an initial wave of V1-directed antibodies that selected for Envs with shortened, hypoglycosylated V1 loops, which in turn primed V3-glycan bNAb precursors. Rhesus bNAbs were immunogenetically and structurally diverse, closely resembling human V3-glycan bNAbs. Env-bNAb coevolution revealed a diverse repertoire of bNAb precursors and the Env variants that matured them, yielding a molecular blueprint for vaccine design.

24.
arXiv (quant-ph) 2026-06-15

Merged amplitude encoding for Chebyshev quantum Kolmogorov–Arnold networks: trading qubits for circuit executions

arXiv:2603.02818v3 Announce Type: replace Abstract: Quantum Kolmogorov–Arnold networks based on Chebyshev polynomials (CCQKAN) evaluate each edge activation function as a quantum inner product, creating a trade-off between qubit count and the number of circuit executions per forward pass. We introduce merged amplitude encoding, a technique that packs the element-wise products of all $n$ input-edge vectors for a given output node into a single amplitude state, reducing circuit executions by a factor of $n$ at a cost of only 1–2 additional qubits relative to the sequential baseline. The merged and original circuits compute the same mathematical quantity exactly; the open question is whether they remain equally trainable within a gradient-based optimization loop. We address this question through numerical experiments on 10 network configurations under ideal, finite-shot, and noisy simulation conditions, comparing original, parameter-transferred, and independently initialized merged circuits over 16 random seeds. Wilcoxon signed-rank tests show no significant difference between the independently initialized merged circuit and the original ($p > 0.05$ in 28 of 30 comparisons), while parameter transfer yields significantly lower loss under ideal conditions ($p < 0.001$ in 9 of 10 configurations). On 10-class digit classification with the $8\times8$ MNIST dataset using a one-vs-all strategy, original and merged circuits achieve comparable test accuracies of 53–78\% with no significant difference in any configuration. These results provide empirical evidence that merged amplitude encoding preserves trainability under the simulation conditions tested.

25.
arXiv (CS.LG) 2026-06-18

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

arXiv:2412.16468v4 Announce Type: replace Abstract: The emergence of large language models (LLMs) has sparked discussion on Artificial Superintelligence (ASI), a hypothetical AI system that surpasses human intelligence. Although ASI remains hypothetical and far beyond current AI capabilities, discussing its potential and exploring its feasibility and potential risks is critical for the development of future AI systems. The idea of superalignment originates from scalable oversight, which studies how to supervise increasingly capable AI systems when direct human supervision becomes insufficient. In this paper, we focus on the superalignment problem: "The process of supervising, controlling, and governing artificial superintelligence." We first review scalable oversight paradigms-Sandwiching, Self-Enhancement, and Weak-to-Strong Generalization – then analyze the limitations of current paradigms through the lens of possibility and impossibility, discuss key challenges, and propose pathways for the safe and continual improvement of future AI systems.