Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-11

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval

arXiv:2605.02411v2 Announce Type: replace Abstract: A semantic gap separates how users describe tasks from how tools are documented. As API ecosystems scale to tens of thousands of endpoints, static retrieval from the initial query alone cannot bridge this gap: the agent's understanding of what it needs evolves during execution, but its tool set does not. We identify this retrieval interface, not planning, as the binding constraint on end-to-end agent performance, and introduce FitText, a training-free framework that makes retrieval dynamic by embedding it directly in the agent's reasoning loop. FitText treats retrieval as test-time evolution of hypotheses: the agent generates natural-language pseudo-tool descriptions (revisable beliefs about the tool it needs), refines them iteratively using retrieval feedback, and explores diverse alternatives through stochastic generation. Memetic Retrieval adds evolutionary selection pressure over candidate descriptions, guided by a tool memory that avoids redundant search. On ToolRet (three domains), FitText's reformulation strategies improve NDCG@5 by 2.7 to 10.6 points over static query retrieval across all base models; on StableToolBench (16,464 APIs) with GPT-5.4-mini, Memetic reaches an 84.3% pooled pass rate, a 26.7-point absolute gain over static query retrieval.

02.
bioRxiv (Bioinfo) 2026-06-11

GeroEngine: Generative single-cell aging trajectories reveal a bidirectionally traversable identity core and direction-specific inflammatory remodeling

作者:

Single-cell RNA sequencing (scRNA-seq) maps aging tissues at high resolution but is destructive, preventing longitudinal tracking; dropout and zero-inflation artifacts, amplified by shift-invariant linear simulations, confound age-associated variability. We developed GeroEngine, a technical-artifact-aware framework combining VAE-based trajectory simulation, LOPO cross-validation, linear baselines, reverse traversal, and reverse-directed network inference. In microglia and HSCs, the VAE reduced technical-artifact carryover while preserving trajectory heterogeneity and improving alignment to artifact-reduced reference manifolds. Consensus GeroTargets and GeroRegulators defined tissue-specific GeroNetworks organized into three pillars: lineage/replication identity collapse, a sex-dimorphic endocrine/stress core, and inflammatory remodeling. Forward and reverse simulations aligned to the common young[->]old aging axis revealed a sign-coherent, direction-specific program: identity/replication targets were bidirectionally recovered, whereas MHC/NF-{kappa}B inflammatory programs were preferentially forward-recovered. These results support identity collapse as a deep traversable core of aging and nominate upstream homeostatic restoration over downstream inflammatory suppression.

03.
arXiv (CS.AI) 2026-06-12

Free-Placement Optimization of Ground Station Locations for Low-Earth Orbit Satellites

arXiv:2606.12667v1 Announce Type: cross Abstract: Rapidly expanding low Earth orbit satellite constellations are placing increasing demands on terrestrial ground networks, motivating the development of more efficient ground station network designs. Current approaches select sites from predefined locations, limiting optimization to existing infrastructure and constraining performance. In contrast, free-placement optimization operates over a continuous spatial domain on Earth, broadening the search space and allowing higher-throughput configurations at the cost of potentially requiring new infrastructure deployment. In this work, we introduce SCORE (Sequential Cyclic Optimization via Refinement & Evaluation), a two-stage free-placement method for ground station design. SCORE combines sequential coordinate selection with cyclic refinement to manage high-dimensionality, non-convexity, and local minima that challenge global optimizers. We benchmark SCORE against one-shot methods such as differential evolution (DE) and integer programming approaches using locations from Kongsberg Satellite Services and the World Teleport Association. Tests across two commercial Earth observation constellations (Capella Space and ICEYE) and one synthetic Walker-Star constellation show that SCORE requires up to 5x fewer function evaluations to converge relative to DE while improving downlink throughput by up to 13%. Compared to fixed-site methods, unconstrained SCORE achieves up to 15% greater total downlink, establishing a strong empirical performance benchmark for flexible placement; infrastructure-constrained SCORE retains over 92% of this gain while restricting placement to within proximity of existing fiber and power infrastructure. We also explore trade-offs between expanding existing stations and deploying new sites, informing future ground network design for operational constellations.

04.
arXiv (math.PR) 2026-06-11

Percolation on hierarchical lattices

arXiv:2606.11503v1 Announce Type: new Abstract: We consider independent Bernoulli percolation on top of sequences of hierarchical graphs. Given a graph $G_{1}$ with two distinguished vertices $a_{1}$ and $b_{1}$, the hierarchical graph with seed $G_{1}$ is the sequence $\big( G_{k} \big)_{k \geq 1}$ resulting from the inductive procedure, where the graph $G_{k+1}$ is obtained from $G_{k}$ by replacing each of its edges with a copy of $G_{1}$, attached by the vertices $a_{1}$ and $b_{1}$. We prove that, under sharp hypotheses, percolation on these graphs presents a unique phase transition. Second, we establish the existence of several critical exponents in this context, such as the critical exponents for the correlation length $\nu$, the surface tension $\mu$, the one-arm exponent $\alpha_{1}$. Several results are also obtained for their infinite counterpart $G_\infty$, which is the Benjamini-Schramm limit of $G_k$: uniqueness of the infinite cluster, continuity of $\theta(p)$, existence of the percolation-probability exponent $\beta$ and scaling relations for the critical exponents $\alpha_1$, $\nu$ and $\beta$. Furthermore, we analyze noise sensitivity for crossing functions in $G_{k}$ and establish sharp noise sensitivity in this setting. Finally, we propose a setup where it is possible to verify the locality hypothesis, stating that the critical threshold for percolation is a local property, while critical exponents are determined by the global geometry of the graph. As a consequence of the techniques developed here, we also provide a necessary and sufficient condition for the existence of a unique fixed point for the map $p \mapsto \mathbb{E}_p[g]$ in $(0,1)$, where $g:\{0,1\}^n \to \{0,1\}$ is a nontrivial monotone Boolean function.

05.
Nature Medicine 2026-06-08

Post-adjuvant chemotherapy in ctDNA-positive patients with resected colorectal cancer: a randomized phase 3 trial

Tumor-informed circulating tumor DNA (ctDNA) enables detection of molecular residual disease (MRD) after curative resection of colorectal cancer (CRC), but whether early intervention improves outcomes remains uncertain. ALTAIR was a randomized, double-blind, phase 3 trial embedded in the CIRCULATE-Japan platform evaluating a post-adjuvant ctDNA surveillance strategy with treatment initiation upon molecular recurrence. Patients with resected stage 0–IV CRC who became ctDNA positive after completion of standard-of-care therapy and had no radiological evidence of disease were randomly assigned (1:1) to receive trifluridine/tipiracil (FTD/TPI) or placebo for 6 months. The primary endpoint was investigator-assessed disease-free survival (DFS). Between July 2020 and June 2023, 243 patients were randomized to FTD/TPI (n = 122) or placebo (n = 121). Median DFS was 9.30 months with FTD/TPI and 5.55 months with placebo (hazard ratio = 0.79, 95% confidence interval: 0.60–1.05, P = 0.107), and the primary endpoint was not met. FTD/TPI increased grade 3 or higher hematologic adverse events (73.0% versus 3.3%) without new safety signals. These findings indicate that post-adjuvant intervention with FTD/TPI did not significantly improve DFS in ctDNA-positive patients without radiological disease. ClinicalTrials.gov identifier: NCT04457297 . In the randomized, double-blind phase 3 ALTAIR trial, patients with resected colorectal cancer who became positive for circulating tumor DNA during post-adjuvant surveillance received trifluridine/tipiracil hydrochloride therapy, which did not significantly prolong disease-free survival compared with placebo.

06.
arXiv (CS.AI) 2026-06-17

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

arXiv:2606.04513v2 Announce Type: replace Abstract: Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directly from sensor data, but they typically treat mapping specifications and traffic regulations as implicit, dataset-dependent supervision. Moreover, in complex scenes (e.g., worn or missing markings and occlusions), correct lane configurations are often under-determined by visual evidence alone, making specification violations a major source of human post-editing. We propose MapAgent, an industrial-grade agentic architecture that augments a vectorization backbone for specification-compliant lane-map production. Rather than merely adding an agent loop to map prediction, MapAgent couples backbone perception with explicit specification verification, constraint-aware reasoning, and deterministic map editing under a bounded, verification-driven Judge-Planner-Worker loop. A vision-language Judge diagnoses errors by jointly inspecting visual evidence and draft vectors, while a tool-calling Planner generates minimal corrective edits with post-edit re-validation. To remain scalable for city-scale production, MapAgent is selectively triggered only on tiles with low backbone confidence, adding modest overhead while preserving throughput. Experiments on real-world datasets show consistent gains over strong production baselines, especially in complex and long-tail scenarios. Additionally, MapAgent has been integrated into Baidu Maps, supporting lane-level map generation for over 360 cities nationwide and elevating the overall production automation to over 95%, demonstrating MapAgent's practicality and effectiveness for large-scale lane-level map generation.

07.
arXiv (CS.AI) 2026-06-16

Graphical-Probabilistic Modeling of Generative Flows in LLM-Native Software Systems

arXiv:2606.15943v1 Announce Type: cross Abstract: Engineering LLM-native software remains a challenging and immature field. Current practice is largely exploratory, relying on experimentation and heuristic techniques such as prompting and context engineering. These, however, are low-level and lack the principled structure needed to support design-level reasoning or analysis. In contrast, traditional software engineering leverages modularity and abstraction to communicate and analyze system behavior. To bring similar rigor to LLM-native development, we propose methods for documenting generative flows and for stating properties of LLM-based software designs. Such methods must account for the stochastic, prompt-dependent behavior of large language models while remaining expressive enough to capture emergent phenomena. Our initial approach is based on graphical probabilistic models, tailored to capture phenomena characteristic of LLM-native systems. This framework – what we term Generation Networks – aims to provide a foundation for principled reasoning about generative interactions and system-level properties in LLM-centric software architectures.

08.
arXiv (CS.CL) 2026-06-17

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling

Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on syllable-like units. However, methods like Sylber and SyllableLM rely on intricate multi-stage training pipelines. We propose ZeroSyl, a simple training-free method to extract syllable boundaries and embeddings directly from a frozen WavLM model. Using L2 norms of features in WavLM's intermediate layers, ZeroSyl achieves competitive syllable segmentation performance. The resulting segments are mean-pooled, discretized using K-means, and used to train a language model. ZeroSyl outperforms prior syllabic tokenizers across lexical, syntactic, and narrative benchmarks. Scaling experiments show that while finer-grained units are beneficial for lexical tasks, our discovered syllabic units exhibit better scaling behavior for syntactic modeling.

09.
arXiv (CS.AI) 2026-06-15

Sensitivity Shaping for Latent Modeling

arXiv:2606.14585v1 Announce Type: cross Abstract: Generative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution (OOD) transitions. Existing methods typically treat the learned dynamics as fixed and attach post hoc support surrogates. We show that these surrogates can fail when the dynamics are locally insensitive to critical action choices: unsupported control actions may produce latent predictions that resemble demonstrated transitions, suppressing OOD signals despite large true predictive errors. To address this, we introduce support-conditioned control-sensitivity regularization, which promotes sensitive local response to control input changes in learned dynamics in high-support training regions. This preserves control-induced variation while limiting unstable extrapolation due to weak empirical support. Experiments in vision-based obstacle avoidance, manipulation, and real-robot navigation show improved OOD detection and safer closed-loop planning.

10.
arXiv (CS.AI) 2026-06-15

Thinking Outside the [Chat]Box: Bridging Computer Science and Industrial Design for Cognitive-Inclusive Generative AI

arXiv:2606.14306v1 Announce Type: cross Abstract: Current Generative AI (GenAI) interfaces remain largely constrained to chatbox interaction, which can impose high cognitive demands on users and create substantial barriers for people with intellectual disabilities (ID), including prompt formulation difficulties, response overload, and limited mechanisms to assess information reliability. To explore alternative interaction models for cognitive accessibility, we conducted a cross-disciplinary co-design challenge in which two student cohorts (Computer Science and Industrial Design) developed interface concepts from the same set of functional requirements (e.g., prompt scaffolding, structured output, GUI-based refinement, transparency, and personalization). Comparing the resulting proposals reveals both convergence on foundational requirements (notably initial calibration, proactive prompting, and direct manipulation of response fragments) and complementary contributions that outline a multi-layered support system. Computer Science teams primarily produced structural scaffolding, emphasizing predictability, navigability, and trust through mechanisms such as reliability indicators, explicit sources, and context management for long conversations. Industrial Design teams emphasized experiential scaffolding, focusing on pacing, attention guidance, multimodality, and proactive agency, including step-by-step response flows, focus modes, and assistant-like integrations. We synthesize these findings into a dual-layer scaffolding framework that expands the design space for cognitively accessible GenAI interaction beyond chat-centric models and motivates future work on expert refinement, technical feasibility, and empirical validation with users with ID.

11.
arXiv (quant-ph) 2026-06-16

Generalized symmetries, invariant solutions and conservation laws in the Jaynes-Cummings model

arXiv:2606.15538v1 Announce Type: cross Abstract: In this work, we investigate the Jaynes–Cummings model (JCM) using Lie symmetry analysis and conservation-law theory. The dynamics is formulated as a system of partial differential equations by projecting the von Neumann equation onto the atomic degrees of freedom and representing the field mode through its characteristic function. We determine the admitted point and generalized symmetries and construct invariant solutions satisfying the physical conditions imposed by quantum mechanics. The conventional dressed-state dynamics is recovered while a second class of solutions with radial dependence expressed through Heun polynomials is obtained for coupled atom–field configurations. We also apply the generating functions methodology to derive local conservation laws of the JCM differential system. Besides recovering the conservation of the total number of excitations, we obtain additional conserved currents involving atomic populations, coherence, reduced-state purity, and moments of the field characteristic function. In particular, we derive a balance equation for a combination of atomic purity and coherence whose evolution is controlled by the atom–field coupling and is linked to atom–field correlation and entanglement dynamics. The symmetry structure further generates generalized symmetries and an infinite hierarchy of conservation laws.

12.
arXiv (quant-ph) 2026-06-16

Single-Image Entanglement Verification with Spatially Encoded Measurement Contexts

arXiv:2606.15382v1 Announce Type: new Abstract: Entangled photon pairs produced by spontaneous parametric down-conversion exhibit rich spatial entanglement structure that is often difficult to probe with conventional measurements. Here, we show that spin-orbit optical elements can convert this spatial structure into directly observable quantum interference patterns. Using a $q$-plate, we demonstrate that the relative wavefront curvature of biphoton states generated by a pair of nonlinear crystals can be retrieved from the spatial modulation of coincidence images. Building on this principle, we introduce a liquid-crystal metasurface that performs spatially multiplexed Bell measurements across the transverse profile of the photon field. The device, which we call a Clauser-Horne-Shimony-Holt (CHSH) plate, assigns different polarization projections to different azimuthal sectors of the beam, allowing the sixteen joint measurements required for a CHSH test to be realized simultaneously in a single acquisition. In this architecture, the spatial coordinate acts as a classical register selecting the measurement context, while photon pairs sample these contexts according to their emission directions. We further demonstrate that the same measurement concept can be implemented using a programmable spatial light modulator, providing a dynamically reconfigurable realization of the scheme. Our results show that spatially structured optical elements can transform Bell tests into parallel measurements distributed across the transverse plane, enabling rapid characterization of spatially varying entanglement. This approach opens new possibilities for structured-light quantum measurements, Bell-inequality-based imaging, and the study of spatially engineered entangled photon sources.

13.
arXiv (CS.CL) 2026-06-16

Scaling Human and G2P Supervision for Robust Phonetic Transcription

Expert phonetic annotation is costly, especially for non-standard dialects and atypical speech. A common alternative is using Grapheme-to-Phoneme (G2P) models to auto-generate phonetic labels from text transcripts at scale. We study how automatic phonetic transcription performance scales with human and G2P supervision in English. Using a curated 80-hour benchmark spanning native, non-native and post-stroke speech, we identify a supervision quality threshold: G2P supervision helps only when fewer than 20-30 hours of human annotation are available. Beyond this threshold, it provides no significant benefit and can reduce cross-dialect robustness. What is effective after this threshold is ASR pretraining which we use to achieve a 2.3x reduction in weighted phone feature error rate over prior systems, with strong gains on non-native and aphasic speech. These results suggest that quantity-driven G2P scaling may yield diminishing returns for robust generalization.

14.
arXiv (CS.CL) 2026-06-16

Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests

A general-purpose language model that answers a harmful question returns text; a coding model that complies with a malicious request can return a working weapon: a keylogger, ransomware, an exploit that runs as written. This asymmetry in the severity of a single act of compliance implies coding-specialized models should clear a higher refusal bar than general-purpose chat models, not a lower one, yet the field cannot tell whether they do. Refusal benchmarks for malicious code are fragmented: they mix requests for executable software with requests for harmful security knowledge and report refusal rates over non-comparable corpora. This paper's central result is that the CODE-versus-KNOWLEDGE classification axis established in a prior four-corpus release remains stable under a substantially expanded corpus pool and an independently refreshed judge panel, evidence that it measures a real construct rather than an artifact of the prompts or judges. Eight corpora spanning diverse elicitation paradigms (direct, jailbreak-decorated, indirect, and agent/interpreter: ASTRA, CySecBench, AdvBench/harmful_behaviors, JailbreakBench, MalwareBench, RedCode, RMCBench, Scam2Prompt) are classified under a five-judge consensus protocol (6,675 prompts x 5 judges = 33,375 calls), reaching Fleiss' kappa = 0.767 [95% CI 0.755, 0.777] ("substantial"). Critically, the panel shares no judge with the prior release (five paid commercial APIs replaced by five open-weight models from five vendors), yet the two panels agree on 94.45% of the 3,133 shared prompts and reach Cohen's kappa = 0.952 [0.942, 0.963] on the 3,031-prompt binary overlap: the axis survives near-total panel replacement. The released bank comprises 4,748 consensus-CODE and 1,923 consensus-KNOWLEDGE prompts, a reliability-quantified benchmark whose central classification axis is shown stable across corpus expansion and judge-panel replacement.

15.
arXiv (CS.LG) 2026-06-19

Neural Architectures as Functional Priors in Physics-Informed Control Problems

arXiv:2606.19368v1 Announce Type: cross Abstract: In this work we investigate the role of neural architectures as implicit functional priors in control problems governed by ordinary differential equations. Rather than focusing on highly complex problems, our objective is to investigate architecture-dependent effects in controlled dynamical systems within the simplest physically interpretable settings possible. In particular, we study a controlled linear RLC electrical circuit and a nonlinear Duffing-type dynamical system. Both systems are analyzed first through classical optimal-control formulations and later through PINN-based approaches. We compare different combinations of multilayer perceptrons (MLPs) and Fourier-based KAN-like architectures, and analyze their influence on the resulting controls. The numerical experiments suggest that different architectural choices systematically generate qualitatively distinct controls, even under identical governing equations, loss functionals, initial and target states, training parameters and physical constraints. Significant differences appear in the spectral structure, smoothness, energy distribution, and phase-space behavior of the learned solutions. A central observation of this work is the emergence of a functional specialization phenomenon when the neural architectures are allowed sufficient freedom to shape the structure of the learned controls. More specifically, in the systems considered here, Fourier-based architectures tend to produce trajectories with richer oscillatory content, whereas smoother low-frequency-biased architectures tend to generate more regular and energetically efficient controls. This suggests that different functional components of the control problem may be handled more efficiently by different neural architectures, leading to an implicit specialization between state representation and control generation.

16.
arXiv (CS.AI) 2026-06-19

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

arXiv:2606.19371v1 Announce Type: cross Abstract: Alzheimer's disease (AD) is a fatal disorder that destroys memory and cognitive skills in the elderly population. Most treatments for AD are effective in the early stage, leading to an increasing demand for early AD diagnosis. AD diagnosis increasingly relies on multimodal data such as clinical assessments, structural Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET) imaging. However, MRI and PET acquisition remain costly and not universally accessible, making full-modality inference impractical in real-world clinical workflows. We propose ProMUSE, a Progressive Multi-modal Uncertainty Guided Staged Evidential Network that adaptively determines when additional modalities are necessary, helping reduce the overall cost of data acquisition while maintaining accuracy. ProMUSE first performs evidential classification using low-cost clinical data and quantifies uncertainty via a Dirichlet-based subjective logic model. When uncertainty exceeds a learned threshold, ProMUSE progressively incorporates MRI or PET features, fusing modality-wise belief and uncertainty through Dempster-Shafer theory to obtain a calibrated multimodal prediction. This staged acquisition strategy enables accurate diagnosis while minimizing reliance on expensive imaging. Experiments on ADNI, AIBL, and OASIS across CN-AD, CN-MCI, and MCI-AD tasks demonstrate that ProMUSE achieves competitive or superior accuracy compared to full-modality baselines while reducing MRI/PET usage by 50-90%, yielding substantial cost savings. These results highlight ProMUSE as a practical, uncertainty-aware, and resource-efficient solution for real-world AD screening.

17.
arXiv (quant-ph) 2026-06-16

Quantum simulation of the Liouville equation in classical mechanics with discontinuous potential via Schrödingerization

arXiv:2606.15066v1 Announce Type: new Abstract: We develop quantum simulation algorithms for the Liouville equation of classical mechanics with discontinuous potential. Such discontinuities represent potential barriers at which classical particles undergo energy preserving transmission or reflection, and the resulting interface conditions must be incorporated into the numerical flux. We combine Hamiltonian-preserving schemes by Jin and Wen in Commun. Math. Sci. 3(3), 285-315 (2005) with the Schrödingerization method, which embeds the resulting nonunitary semi-discrete dynamics into a unitary Schrödinger type system in one additional auxiliary variable [arXiv:2212.14703, arXiv:2212.13969]. For one-, two-, and $n$-dimensional problems with grid aligned interfaces, we construct sparse matrix representations of the transmission and reflection fluxes using step and hat functions, derive the corresponding Hamiltonians of the Schrödingerized systems, and analyze their sparse-access query complexity. In the sparse-access oracle model, the resulting algorithms have a polynomial dependence on the inverse accuracy and avoid the exponential dependence on the phase-space dimension suffered by classical grid based Hamiltonian-preserving schemes, up to the cost of implementing the oracles and the postselection overhead. We also describe the postselected recovery of the physical solution state and the quantum readout of macroscopic observables such as density and averaged velocity through overlap estimation. Numerical experiments based on classical simulation of the Schrödingerized dynamics validate the proposed formulation and illustrate the correct transmission/reflection behavior at potential barriers.

18.
arXiv (CS.CV) 2026-06-18

Revealing Hidden Vulnerabilities in Autoencoders through Gradient Signal Restoration

Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize reconstruction damage, often converge to suboptimal perturbations, thereby potentially overstating AE robustness. We show that this limitation is linked to vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, associated with near-zero singular values in their intermediate weight matrices. To address this, we propose GRILL (Gradient Signal Restoration in Ill-Conditioned Layers), a framework designed to mitigate gradient degradation and improve the reliability of adversarial robustness evaluation in encoder-decoder architectures. GRILL is designed to mitigate adversarial gradient degradation during optimization, enabling attacks to better approximate high-distortion perturbations under fixed norm constraints. Through extensive experiments across multiple AE architectures, under both sample-specific and universal attacks, as well as standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, thereby exposing vulnerabilities hidden by existing attack limitations. Beyond AEs, we provide preliminary evidence that modern multimodal encoder-decoder architectures exhibit similar vulnerabilities.

19.
arXiv (CS.LG) 2026-06-12

Analog Quantum Asynchronous Event-Based Graph Neural Network

arXiv:2606.11000v1 Announce Type: cross Abstract: Asynchronous, event-based graph neural networks (AEGNNs) have recently emerged as an efficient paradigm for processing the sparse and high-temporal-resolution data from event cameras. In this paper, we propose quantum analog AEGNNs (QA-AEGNNs), a novel framework to implement an AEGNN on a neutral-atom quantum computer. Neutral-atom quantum processors offer a programmable analog quantum computing platform based on controllable Rydberg-atom interactions. To this end, we map the streaming event data to an array of trapped neutral atoms, where each atom represents a graph node (event) and is positioned such that geometric proximity reflects the spatio-temporal neighborhood of events. The native Rydberg Hamiltonian of the quantum processor is programmed to mirror the message-passing computations of the AEGNN, with atomic qubit states serving as node feature embeddings and inter-atom interactions realizing graph edges. Furthermore, we propose a hybrid quantum-classical training scheme in which the analog Hamiltonian parameters (e.g., laser pulse amplitudes and detunings) are optimized using classical feedback to learn the quantum AEGNN model from data. Our approach leverages the continuous Hamiltonian dynamics and massive parallelism of neutral-atom quantum systems to natively execute event-based graph computations with potential accuracy improvements

20.
arXiv (CS.AI) 2026-06-15

Aligning Quantum Operators with Large Language Models

arXiv:2606.13811v1 Announce Type: cross Abstract: Can Large Language Models (LLMs) understand and reason about quantum operators? Despite their remarkable capabilities in mathematics and symbolic reasoning, LLMs remain inherently blind to quantum representations such as unitary matrices. In this work, we take a step toward bridging this gap by introducing an approach that maps unitary operators into the latent space of an LLM, enabling unified modeling over quantum and linguistic inputs. We instantiate this idea on Clifford+T circuit synthesis over a Pauli rotation gate set, where our model achieves results competitive with state-of-the-art methods and scales consistently with training data, with no signs of saturation. Our approach further enables language-conditioned synthesis, allowing gate constraints unseen during training to be specified directly in natural language. This work suggests a path toward quantum–aware foundation models that can natively interpret and reason about quantum operations, which could have broader implications reaching across quantum compilation and algorithm discovery.

21.
arXiv (CS.CL) 2026-06-11

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

We propose UR-BERT, a Romanized transcription-based text-to-speech (TTS) encoder for massively multilingual TTS systems. Conventional grapheme-to-phoneme (G2P)-based approaches are limited to around 100 languages due to the availability of reliable G2P resources. In contrast, UR-BERT scales to 495 languages by unifying diverse writing systems into a shared Romanization representation. To further enhance phonetic fidelity and text-speech alignment, we introduce a speech token prediction objective during training, which encourages the encoder to learn speech-aware phonetic representations in a data-efficient manner. Experiments show that TTS systems built on UR-BERT consistently outperform recent text encoder baselines across a wide range of languages and resource conditions, and demonstrate strong generalization to unseen languages.

22.
arXiv (CS.CV) 2026-06-16

CausalDrive: Real-time Causal World Models for Autonomous Driving

World models have emerged as a promising paradigm for scaling autonomous driving (AD) data, yet existing video generative models fall short as interactive simulators. Layout-conditioned renderers rely on "oracle" future trajectories of all background agents, rendering them strictly non-reactive. Conversely, pure action-conditioned predictors lack semantic control over complex interactions and suffer from prohibitive diffusion latencies, hindering closed-loop policy learning. To bridge this gap, we present CausalDrive, a controllable, real-time foundation driving world renderer. CausalDrive operates solely on the initial front-view frame, the ego-vehicle's trajectory, and a macroscopic text prompt. By excluding future NPC layouts, we compel the model to intrinsically predict causal interactions, enabling text-driven control over Driving Sociology, allowing users to dynamically orchestrate diverse counterfactual reactions to identical ego-actions. To overcome the efficiency bottleneck and address the covariate shift in autoregressive generation, we propose a novel Context-Forced DMD architecture. This combines continuous flow-matching with a self-correcting distillation objective, achieving interactive speeds of 12 FPS. This breakthrough transforms the passive video generator into a playable neural simulator. We demonstrate its versatility across three downstream applications: (1) generative closed-loop evaluation with significantly mitigated collision artifacts, (2) large-scale Reinforcement Learning (RL) post-training driven by a Video2Reward module, and (3) real-time human-in-the-loop simulation. Extensive experiments validate that policies trained within CausalDrive's reactive scenarios exhibit superior interaction capabilities in the real world.

23.
bioRxiv (Bioinfo) 2026-06-19

SteerAF: Distogram-based Steering of AlphaFold2 toward Alternative Conformations

End-to-end structure predictors, such as AlphaFold2, typically output only the dominant conformational state of a given protein, which is biased by the training data set. Existing strategies for recovering alternative conformations are often computationally expensive and offer limited biological interpretability. Here, we present SteerAF, an inference-time optimization framework based on AlphaFold2 that leverages information encoded in the distogram derived from deep multiple sequence alignments (MSAs) to predict alternative protein conformations. Across four benchmark datasets, SteerAF matches or surpasses existing methods in predicting alternative conformations for the majority of systems. Sparse MSA-feature modifications generated via block gradient ascent exhibit a strong correlation with experimentally characterized functional residues, recovering them with approximately 50% precision in the tested proteins. Furthermore, SteerAF enables effective decoy selection in the absence of experimental structures, and its predictions can serve as seed structures for molecular dynamics simulations to map conformational landscapes. Thus, SteerAF provides an efficient and interpretable approach for predicting alternative conformations, offering a framework that can be extended to other similar predictors and problems.

24.
arXiv (CS.LG) 2026-06-16

GRASP: Gradient-Aligned Sequential Parameter Transfer for Memory-Efficient Multi-Source Learning

arXiv:2606.14900v1 Announce Type: new Abstract: Multi-source transfer learning faces a fundamental scalability bottleneck: existing approaches require either loading all K source models into memory simultaneously during parameter fusion, requiring O(K) memory, or deploying all models at inference time, making production deployment infeasible. We propose GRASP (Gradient-Aligned Sequential Parameter Transfer), which achieves superior knowledge integration while maintaining O(1) memory consumption through three key innovations: (1) sequential processing that merges one source at a time into an evolving target model, (2) parameter-wise gradient alignment that selectively transfers only parameters whose optimization directions align with the target domain, avoiding negative transfer, and (3) iterative fine-tuning that adapts transferred knowledge before integrating the next source. Extensive experiments across three continual learning benchmarks (Yearbook, CLEAR-10, CLEAR-100) spanning 10 to 108-year temporal distribution shifts and four architectures (1.3M to 25.6M parameters) demonstrate that GRASP achieves 93.5% mean accuracy over all datasets and architectures compared to ensemble method's 71.7% accuracy while requiring only constant memory versus K models for standard multi-source fusion. Critically, GRASP's sequential previously merged models and scales to arbitrarily many sources without memory growth, making it uniquely suitable for resource-constrained deployment and continually evolving source domains.

25.
arXiv (math.PR) 2026-06-19

Finite-Sample Bounds for Expected Signature Estimation under Weak Dependence

arXiv:2605.20541v2 Announce Type: replace-cross Abstract: The expected signature uniquely determines the law of a random rough path under a moment-growth condition, yet finite-sample bounds for estimating its truncations from a single long dependent trajectory remain unavailable. We study a strictly stationary stochastic process equipped with a geometric rough-path lift, observed in non-overlapping blocks of equally-spaced samples, and prove a non-asymptotic mean-squared error (MSE) bound for the block-averaging estimator of its truncated expected signature. Under moment and stationarity assumptions together with a direct covariance-decay condition on block signatures – strictly weaker than $\alpha$-mixing and applicable to long-range-dependent processes – the error separates into a discretization term and a fluctuation term, with rates determined respectively by path regularity and dependence strength. A levelwise rough-factorial variance analysis keeps finite-truncation constants explicit and yields an optimal allocation rule under a fixed observation budget. We verify the assumptions for independent-coordinate fractional Ornstein–Uhlenbeck processes in three regimes: short-range (Hurst $1/41/2$. Monte Carlo experiments show empirical slopes steeper than the guaranteed upper-bound rates.