论文广场 - AcademicHub

01.

medRxiv (Medicine) 2026-06-15 DOI: HASH:2f8168b140608dc375db69b52d65ff10

Population-scale genomics reveals divergent pathogenicity of variant classes across paralogous collagen IV genes

作者:

Tzoumkas ↗Doctor ↗G. T ↗Sadeghi-Alavijeh ↗Gale ↗D. P ↗

Monoallelic pathogenic or likely pathogenic variants in COL4A3 and COL4A4 occur in approximately 1 in 106 individuals, yet whether these paralogous genes confer equivalent pathogenicity for the same variant classes has not been tested at population scale. Using whole-genome sequencing data from the UK Biobank (UKB; n = 500,000), with replication in the All of Us Research Program (n = 414,000), we performed per-variant association testing, gene-based collapsing analyses and phenome-wide association studies (PheWAS) across haematuria, proteinuria and chronic kidney disease. We identified 64 COL4A3 and 92 COL4A4 rare variants significantly associated with haematuria or proteinuria, generating a quantitative allelic series for clinical variant interpretation. Glycine substitutions within collagenous domains conferred similar risks in both genes. In contrast, truncating and non-collagenous domain (NC1) missense variants were strongly associated with haematuria and proteinuria in COL4A4 carriers but showed substantially attenuated or absent associations in COL4A3 carriers despite comparable carrier frequencies and predicted pathogenicity scores. These findings were independently replicated in All of Us. Genome-wide association analysis identified the COL4A3/COL4A4 locus as the dominant genetic determinant of haematuria, with the signal attributable to the aggregate effects of rare coding variants and no evidence of independent common variant or trans-acting modifier effects. These findings demonstrate substantial gene-specific differences in tolerance to truncating and NC1 variants between COL4A3 and COL4A4, challenging assumptions of equivalent pathogenicity across paralogous collagen IV genes. Gene identity and not variant class alone, should inform risk stratification, variant interpretation and genetic counselling in individuals carrying collagen IV risk genotypes.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2509.21862

Shachi: A Modular, Controllable Framework for LLM-Based Agent-Based Modeling of Emergent Collective Behavior

作者:

So Kuroki ↗Yingtao Tian ↗Kou Misaki ↗Takashi Ikegami ↗Takuya Akiba ↗Yujin Tang ↗

arXiv:2509.21862v3 Announce Type: replace Abstract: How collective behaviors emerge from the interactions of individual LLM-driven agents is a central question in artificial life, yet controlled study of these emergent dynamics has been hindered by the lack of a principled simulation framework for systematic experimentation. To address this, we introduce Shachi, a principled methodology and modular framework that decomposes an agent's cognition into core components: Configuration for intrinsic identity, Memory for contextual continuity, and Tools for extended capabilities, all orchestrated by an LLM reasoning engine. This decomposition treats each cognitive component as an independently controllable variable, enabling perturbation studies that trace how micro-level cognitive traits propagate into population-level dynamics. We investigate behavioral patterns across a 10-task benchmark spanning three levels of collective complexity. Shachi enables memory transfer across environment transitions, producing history-dependent behavioral shifts, and allows agents to simultaneously inhabit multiple environments, revealing cross-environment interference invisible in single-environment studies. Furthermore, in a real-world U.S. tariff shock case study, locally interacting agents with individually controlled cognitive components produce macro-level market dynamics directionally consistent with observed real-world outcomes. Our work provides a rigorous, open-source simulation framework for LLM-based ABM, aimed at fostering cumulative scientific inquiry into the emergent collective behaviors of interacting artificial agents.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2606.17354

Translating the Untranslatable: An Operationalizable Ontology for Untranslatability

作者:

Jacob Bremerman ↗Brihi Joshi ↗Hirona Arai ↗Xiang Ren ↗Jonathan May ↗

Untranslatability, cases where meaning cannot be directly preserved across languages, is well-studied in linguistics but underexplored in NLP. As machine translation (MT) systems improve on standard benchmarks, their limitations increasingly concentrate in such cases, where translation cannot be reduced to one-to-one equivalence. We introduce a structured ontology of untranslatability along with a taxonomy of compensation strategies, which are specific techniques to convey meaning under these untranslatable circumstances. We operationalize this framework into a multilingual dataset of untranslatable sentences paired with strategy-based translations, enabling controlled analysis of translation behavior. Initial human preference studies suggest that translation quality depends on the strategy used, with consistent preferences for outputs that include explanatory context, known as the Annotation compensation strategy. Our framework and dataset provide a foundation for studying and modeling strategy-informed machine translation.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16428

Lect\=uraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

作者:

Jaward Sesay ↗Yue Yu ↗Siwei Dong ↗Yemin Shi ↗Guangyao Chen ↗B\"orje F. Karlsson ↗

Effective personalized AI-assisted learning demands systems that can not only generate accurate learner-specific educational materials, but also dynamically adapt their instruction to diverse learners. However, existing educational agents have primarily focused on lecture content automation and simulations, which often fall short of modelling multimodal and embodied instructional methods tailored for the individual learner. To this end, we propose Lect\=uraAgents - a multi-agent framework that enables personalized learning through end-to-end adaptive embodied teaching. At its core, Lect\=uraAgents mirrors a professor-student relationship, in which a ProfessorAgent leads a collaborative team of specialized subordinate agents through research, planning, review, and embodied delivery of lecture contents that adapt to a learner's needs. The framework offers three main contributions: (1) a hierarchical multi-agent architecture for end-to-end personalized learning; (2) an adaptive embodied teaching mechanism, wherein the ProfessorAgent executes visible and pedagogically motivated teaching actions (e.g., handwrite, highlight, underline, etc.) over contents in a teaching environment; and (3) a Teaching Action-Speech Alignment (TASA) algorithm that employs salience-based heuristics and temporal semantic segmentation to generate coherent teaching action sequences aligned with learner profiles. We evaluate Lect\=uraAgents on diverse courses at high school, undergraduate, and graduate levels using sample-specific rubric-based analysis; with generated lecture materials and teaching actions assessed and validated by expert educators. Experimental results show consistent gains in lecture content quality, embodied teaching quality, assessment, and personalization over existing approaches, positioning Lect\=uraAgents as a pedagogically well-grounded framework for personalized learning at scale.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16472

From Awareness to Adherence: Bridging the Context Gap in Spoken Dialogue Systems via Context-Aware Decoding

作者:

Che Hyun Lee ↗Heeseung Kim ↗Sungroh Yoon ↗

Despite the success of end-to-end (E2E) spoken dialogue systems, maintaining strict context adherence in multi-round conversations remains a challenge. While prior works attribute these failures to models forgetting dialogue history, we highlight an equally critical but overlooked bottleneck: a gap between latent context awareness and active adherence. Although models internally recognize relevant past utterances, strong parametric priors often overshadow these signals during decoding. To bridge this gap, we propose an audio-adapted Context-Aware Decoding (CAD) approach. By leveraging internal attention mechanisms to isolate key historical rounds, our approach contrasts output distributions with and without this key context during inference, directly amplifying multimodal contextual signals. Evaluations on the Audio MultiChallenge benchmark demonstrate significant improvements in Semantic Memory and Self Coherence subtasks, successfully enforcing strict, context-faithful adherence.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.09347

IB-HFN: Information Bottleneck-Driven SAR-Optical Fusion Network for High-Fidelity Cloud Removal

作者:

Haojun Guo ↗Fan Feng ↗Ziquan Wang ↗Yongsheng Zhang ↗Ying Yu ↗

Synthetic aperture radar (SAR)-assisted optical cloud removal aims to recover surface information obscured by clouds in optical remote sensing images by exploiting complementary SAR observations. Existing multimodal fusion methods typically rely on direct spatial concatenation and pixel-wise supervision, which can propagate SAR speckle noise into optical reconstruction and lead to over-smoothed results. To address these limitations, we propose an Information Bottleneck-driven High-Fidelity Network (IB-HFN) for SAR-assisted optical cloud removal. IB-HFN employs a dual-stream backbone to preserve modality-specific representations before deep semantic fusion, thereby mitigating premature cross-modal contamination. At the fusion stage, we introduce a Spatial Information Bottleneck Fusion module that compresses SAR features through a channel-wise variational information bottleneck to suppress unstructured speckle noise. In parallel, a local-global gating mechanism predicts clear-sky regions and routes reliable optical details through a Dirac-initialized skip connection, decoupling noise suppression from texture preservation. We further develop a joint optimization strategy that integrates feature-level bottleneck regularization with image-level constraints on reconstruction accuracy, structural consistency, spectral fidelity, and contrastive sharpness. A dynamic weighting schedule balances these objectives to stabilize training and reduce hazy artifacts. Experiments on the SEN12MS-CR dataset under challenging spatio-temporal splits demonstrate that IB-HFN achieves superior structural preservation and spectral fidelity over existing methods.

阅读与讨论 → 访问原文 →

07.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2606.19444

Unleashing Emergent Fermions with Rydberg Atom Simulators

作者:

Hanteng Wang ↗Xingyu Li ↗Shang Liu ↗Yingfei Gu ↗Chengshu Li ↗

arXiv:2606.19444v1 Announce Type: cross Abstract: Rydberg atom simulators, in both analog and digital modes, have attracted significant recent interest due to their versatile geometric reconfigurability. In this work, leveraging this feature, we propose two complementary approaches, one for each mode, to characterize emergent fermions in critical quantum many-body systems. In the analog mode, we assemble the Rydberg atoms in a "developable" (namely, preserving local couplings) Möbius band geometry to realize antiperiodic boundary conditions, where fermionic states reside. Spectroscopic measurement in this sector then reveals universal energy ratios of the bosonic and fermionic states. In the digital mode, we carry out a fermionic version of Kibble-Zurek ramping with a quantum circuit, directly addressing the fermionic scaling form. Reconfigurability allows an exponential speed-up of this task, with an $O(\log L\log\log L)$ circuit-depth overhead. Our work establishes the Rydberg atom simulator as a uniquely powerful platform to attack the notoriously difficult issue of experimentally probing emergent fermions that are nonlocally defined in a bosonic system.

阅读与讨论 → 访问原文 →

08.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.15586

Quantum Illumination with Symmetry-Constrained Random Unitaries

作者:

Sooryansh Asthana ↗Sai Vinjanampathy ↗

arXiv:2606.15586v1 Announce Type: new Abstract: Quantum illumination provides a quantum advantage in detecting weakly reflecting objects embedded in a noisy environment, even when environmental noise destroys most of the initial entanglement. We investigate this advantage using Haar-random probe states constrained to symmetry-resolved subspaces. Employing tools from quantum channel discrimination and asymptotic hypothesis testing, we derive the discrimination exponents associated with Haar-random probe ensembles and identify the role of symmetry in determining their performance. We show that typical states drawn from fixed-charge sectors achieve the same asymptotic quantum-illumination advantage as maximally entangled probes. In particular, we show that the effective thermal-noise suppression and the corresponding Chernoff exponent are governed by the dimension of the accessible symmetry sector. Our results reveal that the operational resource underlying quantum illumination can be generalized from fine-tuned structure of a specific probe state to the existence of a large symmetry-protected correlation subspace. These findings establish a direct connection between quantum illumination, symmetry-resolved typicality, and quantum channel discrimination, and demonstrate that near-optimal quantum hypothesis testing resources can emerge naturally from generic many-body quantum states constrained by conservation laws.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2604.08552

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

作者:

Josef Hardi ↗Martin J. O'Connor ↗Marcos Martinez-Romero ↗Jean G. Rosario ↗Stephen A. Fisher ↗Mark A. Musen ↗

arXiv:2604.08552v2 Announce Type: replace-cross Abstract: Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. Even when standard metadata reporting guidelines exist, they typically lack machine-actionable representations. Producing FAIR datasets requires encoding metadata standards as machine-actionable templates with rich field specifications and precise value constraints. Recent work has shown that LLMs guided by field names and ontology constraints can improve metadata standardization, but these approaches treat constraints as static text prompts, relying on the model's training knowledge alone. We present an LLM-based metadata standardization system that queries standard reporting guidelines and authoritative biomedical terminology services in real time to retrieve canonically correct standards on demand. We evaluate this approach on 839 legacy metadata records from the Human BioMolecular Atlas Program (HuBMAP) using an expert-curated gold standard for exact-match assessment. Our evaluation shows that augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields, demonstrating a practical approach to automated standardization of biomedical metadata.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.20177

Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

作者:

Haochen Han ↗Jue Wang ↗Alex Jinpeng Wang ↗Fangming Liu ↗

arXiv:2606.20177v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sensing (RS) tasks. However, their ability to comprehend negation remains underexplored, limiting deployment in real-world applications where models must explicitly identify what is false or absent, e.g., emergency responders need to locate non-flooded routes for evacuation. To comprehensively study this limitation, we introduce RS-Neg, the first benchmark to evaluate negation understanding across region-level to scene-level tasks. Specifically, we design an automated data generation pipeline for RS imagery, using LLMs to synthesize diverse negation queries, and introduce a dynamic visual focus module for verification. Our evaluation reveals that advanced RS MLLMs struggle with negation, exhibiting hallucinations and substantial performance degradation. To close this gap, we propose NeFo, a novel test-time learning method that explicitly incorporates the logical role of negation into the model optimization. Remarkably, using about 5\% unlabeled test samples, NeFo significantly improves the negation understanding of models and shows strong generalization to unseen tasks. Code and data will be released upon acceptance.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16603

VeriGraph: Towards Verifiable Data-Analytic Agents

作者:

Jiajie Jin ↗Zhao Yang ↗Wenle Liao ↗Yuyang Hu ↗Guanting Dong ↗Xiaoxi Li ↗Yutao Zhu ↗Zhicheng Dou ↗

LLM-based agents have demonstrated strong capabilities in data-intensive analytical tasks, yet their outputs are rarely verifiable: a reliance on linear text trajectories makes their reasoning difficult to audit. In particular, deterministic computations over raw data and semantic deductions over natural-language claims are often entangled in an unstructured stream, leaving numerical conclusions hard to reproduce and qualitative judgments hard to inspect. To address this, we propose VeriGraph, a traceable neuro-symbolic reasoning framework that enables agents to construct an explicit heterogeneous evidence directed acyclic graph (DAG) during execution. VeriGraph introduces three evidence-expansion primitives, namely computational, grounding, and derivational expansion, to connect raw data, interpreter variables, computed results, and natural-language claims in a unified graph. Under this formulation, structural traceability is reduced to graph reachability from raw data sources to terminal claims, while semantic support is measured by claim-level evidence evaluation. To improve graph construction, we further design a graph-based policy optimization strategy with a composite reward that jointly supervises answer correctness, computational integrity, and derivational coherence. Experiments on four benchmarks show that VeriGraph-8B achieves the highest overall score among all baselines. More importantly, VeriGraph produces auditable evidence graphs with substantially stronger claim grounding, achieving a 87.61\% Grounding Rate under our claim-level evidence support evaluation. These results suggest that explicit evidence-graph construction is a promising path toward verifiable data-analytic agents. Our code is available at https://github.com/ignorejjj/VeriGraph.

阅读与讨论 → 访问原文 →

12.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2512.16242

Hardy-type self-testing and exposedness of tripartite GHZ correlations

作者:

Smritikana Patra ↗Soumyajit Pal ↗Ranendu Adhikary ↗

arXiv:2512.16242v2 Announce Type: replace Abstract: Nonlocality can be witnessed either through Bell-inequality violations or through logical contradictions such as Hardy's paradox. In the bipartite two input two outcome scenario, these two routes have distinct geometric behavior: CHSH-maximal correlations are exposed points of the quantum set, whereas known Hardy-type self-testing correlations on the no-signaling boundary are non-exposed. Here we show that this bipartite intuition fails in the tripartite two input two outcome scenario. We study the tripartite instance of a multipartite Hardy-type paradox and prove that the correlation attaining the maximal Hardy success probability self-tests the Greenberger–Horne–Zeilinger state and the associated measurements. Although this correlation lies on the no-signaling boundary, we show that it is an extremal and exposed point of the quantum correlation set. Moreover, it coincides with the correlation attaining the maximal violation of the Mermin inequality. Thus, in the tripartite GHZ scenario, the logical-paradox and Bell-inequality routes to nonlocality select the same exposed quantum boundary point. We also establish a robust version of the self-test, showing that small deviations from the ideal Hardy constraints imply quantitative closeness to the target state and measurements. Our results reveal a qualitative geometric difference between bipartite and tripartite Hardy-type nonlocality and suggest a broader investigation of exposedness for multipartite Hardy correlations in the multiparty setting.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.11961

Categorical Prior Lock-in: Why In-Context Learning Fails for Structured Data

作者:

Antonio Pelusi ↗Stefano Braghin ↗Alberto Trombetta ↗

arXiv:2606.11961v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as conditional generators for structured data, relying on in-context learning (ICL) to adapt to new distributions without parameter updates. We investigate the limits of ICL for structured generation under distribution mismatch, using high-cardinality tabular data as a controlled test case, and identify a structural failure mode we term categorical prior lock-in: the inability of ICL to update the model's prior over token distributions inherited from pre-training. Across two 7B-parameter open-weight models, ICL improves numerical fidelity with additional examples but exhibits a sharp ceiling on categorical distributions, failing to reproduce rare classes entirely. Parameter-efficient fine-tuning (LoRA) overcomes these limitations but introduces measurable memorization risk and, in some cases, destabilizes structured output generation, highlighting a fundamental trade-off between adaptability and privacy.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2503.17867

Detecting and Mitigating DDoS Attacks with AI: A Survey

作者:

Alexandru Apostu ↗Silviu Gheorghe ↗Andrei H\^iji ↗Nicolae Cleju ↗Andrei P\u{a}tra\c{s}cu ↗Cristian Rusu ↗Radu Ionescu ↗Paul Irofti ↗

arXiv:2503.17867v3 Announce Type: replace-cross Abstract: Distributed Denial of Service attacks represent an active cybersecurity research problem. Recent research shifted from static rule-based defenses towards AI-based detection and mitigation. This comprehensive survey covers several key topics. Preeminently, state-of-the-art AI detection methods are discussed. An in-depth taxonomy based on manual expert hierarchies and an AI-generated dendrogram are provided, thus settling DDoS categorization ambiguities. An important discussion on available datasets follows, covering data format options and their role in training AI detection methods together with adversarial training and examples augmentation. Beyond detection, AI based mitigation techniques are surveyed as well. Finally, multiple open research directions are proposed.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2603.13584

An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process

作者:

Nicholas M. Synovic ↗Karolina Ryzka ↗Alessandra V. Vellucci Solari ↗Kenny Lyons ↗James C. Davis ↗George K. Thiruvathukal ↗

arXiv:2603.13584v2 Announce Type: replace-cross Abstract: Deep learning has achieved recognition for its impact within natural sciences, yet the prohibitive financial and technical cost of training models from scratch inhibit adoption. Following software engineering community guidance, natural scientists are reusing pre-trained deep learning models (PTMs) to amortize these costs. While prior works recommend PTM reuse patterns, we present the first empirical study of PTM reuse patterns in the natural sciences, quantifying the utilization and impact of PTM reuse within the scientific process across 17,718 peer reviewed, open access papers. Our results show that "Biochemistry, Genetics and Molecular Biology" has outpaced other natural scientific fields in PTM reuse, "adaptation" reuse is the most prevalent PTM reuse pattern identified across all natural science fields, and the "testing" stage of the scientific process has been most impacted by PTM integration.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.15291

A Formal Framework for Declarative Agentic AI in Business Process Analysis

作者:

Mohammad Azarijafari ↗Luisa Mich ↗Michele Missikoff ↗

arXiv:2606.15291v1 Announce Type: new Abstract: Agentic AI opens new opportunities for automating Business Process (BP), enabling autonomous decision-making and dynamic adaptation. However, realising this potential requires BP entities and their interactions to be defined with formal precision. This paper presents a formal framework for Agentic BP analysis through the AGO methodology. AGO captures the modelling perspective in terms of who is acting (Agents), why it is carried out (Goals), and what the relevant entities are (Objects). Grounded in set theory and mathematical logic, we formally define the AGO entity types and their interactions, organising all definitions into a BP Knowledge Base (BPKB). The resulting BPKB supports structured querying, incremental updates, and automatic generation of BP workflows, while ensuring soundness and completeness of the derived paths.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.20283

Boundary Embedding Shaping with Adaptive Contrastive Learning for Graph Structural Disentanglement

作者:

Jiaqing Chen ↗Zidu Yin ↗Yichao Cai ↗Yuhang Liu ↗Zhen Zhang ↗Dong Gong ↗Javen Qinfeng Shi ↗

arXiv:2606.20283v1 Announce Type: cross Abstract: Graph neural networks (GNNs) excel at aggregating neighbor information for classification, yet their performance is hindered by graph structural entanglement, where spurious correlations from semantically irrelevant neighbors contaminate node embeddings. This challenge is most acute for nodes near class boundaries in the embedding space, where amplified structural noise blurs decision boundaries and destabilizes predictions. Existing robust GNN methods largely treat all nodes uniformly, ignoring boundary vulnerabilities. In this paper, to improve classification performance, we tackle graph structural disentanglement by identifying boundary-region entanglement as the primary bottleneck and propose Boundary Embedding Shaping (BES), an adaptive contrastive learning GNN plug-in module that selectively suppresses spurious structural noise at decision boundaries with minimal model parameter perturbation. Extensive experiments demonstrate that BES consistently improves boundary discrimination and outperforms existing leading methods. Notably, BES boosts GCN performance by an average of 3.3% in node classification (up to 5.0% on WikiCS) and achieves superior accuracy in link prediction.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.03077

Libra: Efficient Resource Management for Agentic RL Post-Training

作者:

Kaiwen Chen ↗Xin Tan ↗Jingzong Li ↗Hong Xu ↗

arXiv:2606.03077v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has emerged as a standard post-training paradigm for shaping large language models (LLMs) into capable agents. In agentic RL, the rollout stage generates trajectories while invoking tools, producing long-tailed and non-stationary workloads that expose two fundamental challenges in resource management. First, due to the long-tail distribution, a small fraction of trajectories dominates rollout makespan. Second, rollout and training are subject to cross-stage imbalance, as they exhibit strong asymmetry in compute patterns, memory demands, and sensitivity to sequence length. Compounding this asymmetry, the sequence length distribution drifts continuously as the policy evolves, rendering any static resource split progressively suboptimal. We present Libra, a resource management system to address both challenges via two core mechanisms. The first is a global resource planner that jointly optimizes GPU allocation across rollout and training clusters. It leverages an elastic hybrid pool to enable lightweight, non-blocking worker reallocation between stages. The second is a causality-driven multi-level feedback queue (C-MLFQ) scheduler, which routes requests to heterogeneous rollout buckets based on causal signals derived from tool-return outcomes, rather than relying on fragile length predictions. Evaluated on 48 A800 GPUs, Libra achieves up to 3.0x higher throughput and converges up to 2.5x faster in reward compared to the baselines.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.15514

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

作者:

Hassan Ismkhan ↗Hamid Bouchahcia ↗

arXiv:2606.15514v1 Announce Type: cross Abstract: Robotic systems perceive the world through multiple input modalities – including visual camera streams and natural language instructions – and must select appropriate actions based on these signals. However, assuming the permanent availability of all input devices is unrealistic, as sensors may fail, become occluded, or drop out entirely during deployment. Robust handling of such missing-modality scenarios is therefore essential for real-world robot operation. This paper introduces RL4IL, a reinforcement learning guided method for imitation learning that selects the most suitable action for a given observation by identifying the most relevant expert demonstrations from a training library. A reinforcement learning policy, trained via Proximal Policy Optimisation over Breadth-First Search candidate sets, ranks candidate demonstrations and a soft cross-attention fusion head aggregates their action signals to produce the final prediction. When a modality is missing at inference time, a dedicated per-modality RL retrieval policy identifies donor demonstrations from the training library, and a soft imputation head reconstructs the missing embedding via cross-attention over the top-ranked donors – without requiring any retraining of the system. Experiments on three LIBERO benchmark suites demonstrate that RL4IL substantially outperforms state-of-the-art imitation learning methods under sensor dropout conditions, while requiring no policy network training. The code can be found at https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera

阅读与讨论 → 访问原文 →

20.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2602.01391

Relighting as a Probe of Visual Priors via Augmented Latent Intrinsics

作者:

Xiaoyan Xing ↗Xiao Zhang ↗Sezer Karaoglu ↗Theo Gevers ↗Anand Bhattad ↗

Image-to-image relighting requires representations that separate illumination from scene properties while preserving dense geometry, material, and photometric cues. We use this task as a probe of visual priors: unlike recognition tasks that reward invariance, relighting tests whether visual features retain the information needed for light transfer. Through a controlled generative relighting framework, we find that strong semantic encoders can degrade relighting quality, exposing a semantic–photometric trade-off between abstraction and physical fidelity. We introduce Augmented Latent Intrinsics (ALI), which balances this trade-off by fusing dense, pixel-aligned visual features into a latent-intrinsic relighting model and refining it with self-supervision on unlabeled real image pairs. ALI improves relighting quality, especially on glossy, metallic, and transparent materials, and demonstrates that generative relighting is an effective tool for quantifying what visual encoders encode about the physical world.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.12199

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

作者:

Zhen Ye ↗Xu Tan ↗Yiming Li ↗Guangyan Zhang ↗Chimin Chan ↗Haohe Liu ↗Zhengxi Liu ↗Hongzhan Lin ↗Zheqi Dai ↗Xinshen Zhang ↗Peiwen Sun ↗Qiuqiang Kong ↗…

Spoken dialogue models typically start from text LLM backbones, yet reasoning often degrades when conditioning on speech instead of text. We attribute part of this modality gap to a temporal-granularity mismatch: speech tokens are temporally redundant and far longer than text under matched semantics, diluting per-token semantic density and weakening text-native reasoning dynamics. We study speech token design as a representation selection problem and sweep frame rates under a frozen LLM backbone with a fixed information rate. To make low frame rates feasible, we introduce factorized FSQ and a lightweight non-autoregressive audio LM head, scaling capacity to nearly 300\,bits/frame without sacrificing efficient prediction. With the bottleneck removed, we sweep frame rates (50$\rightarrow$2.08\,Hz) and alignment depth, and observe a consistent best regime for speech QA at 4.17\,Hz with intermediate-layer representation alignment.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.19984

Kolmogorov-Arnold Reservoir Computing

作者:

Juntian Huang ↗Jurgen Kurths ↗Ying Tang ↗

arXiv:2606.19984v1 Announce Type: new Abstract: Reservoir computing offers a lightweight framework for forecasting dynamical systems but may struggle to capture long-range dependencies due to limited representational capacity. Conventional reservoir computing recurrently uses trainable reservoirs with hyperparameter sensitivity, while the next-generation reservoir computing removes recurrence at the cost of rapidly growing feature dimensions. Here, we develop Kolmogorov-Arnold Reservoir Computing (KARC), which replaces reservoirs with explicit basis-function expansions inspired by the Kolmogorov-Arnold representation theorem. We rigorously show that KARC is a lightweight design of Kolmogorov-Arnold networks (KANs), preserving the potential expressive capacity of KANs while admitting efficient closed-form training of reservoir computing. At comparable cost, KARC outperforms existing reservoir computing methods on challenging benchmarks including partial differential equations. It can also be integrated with generative diffusion models for text-to-image generation. This work thus establishes a principled bridge between reservoir computing and KANs, enabling efficient and high-fidelity dynamical system forecasting.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.18746

What Must Generalist Agents Remember?

作者:

Khurram Yamin ↗Namrata Deka ↗Maitreyi Swaroop ↗Albert Ting ↗Jeff Schneider ↗Bryan Wilder ↗

arXiv:2606.18746v1 Announce Type: new Abstract: This paper develops a formal account of what generalist agents must store in memory in order to act near-optimally across multiple environments and goals. It shows that when two domains share an observational bottleneck but require incompatible optimal actions, any uniformly near-optimal policy must induce distinct memory distributions at that bottleneck. The result yields a separation theorem: sufficiently successful agents cannot rely only on current state observations, but must preserve domain-relevant information in memory. The paper further shows that if an agent's memory contains enough information to estimate values for related goals, then that memory can be used to approximately reconstruct the agent's local transition dynamics. Together, these results characterize memory as the substrate that supports domain disambiguation, transition-model reconstruction, and planning for generalist agents.

阅读与讨论 → 访问原文 →

24.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2606.17064

Tensor network compression using fluid dynamics as a testbed: Analytical foundations in one dimension

作者:

Matthew D. Horner ↗Callum W. Duncan ↗Oliver T. Brown ↗Stephen M. de Bruyn Kops ↗Muralikrishnan Gopalakrishnan Meena ↗

arXiv:2606.17064v1 Announce Type: cross Abstract: High performance computers produce extreme-scale data sets that require sampling or compression if they are to be used to their full potential. Existing data compression techniques typically exploit features such as sparsity in the data, homogeneity in the data, or {\it a priori} knowledge of what subsets of data are of most interest. Fluid dynamics data in general do not exhibit these features and so are attractive test beds for generic compression techniques that are objective, robust, and tuneable with respect to information lost due to compression. Presented here is a method based on tensor networks, specifically matrix product states or tensor trains, that meets these requirements. The method is demonstrated for compression in one-dimension and is extensible to higher dimensionality. Lossless compression is demonstrated for random Fourier series for sufficiently high bond dimension of the tensor network, with the memory required to store the tensor network scaling directly proportional to the bond dimension. The lossy compression exhibited at lower bond dimension can be well within the relative error of many fluid simulations. The compression algorithm is tested for the time evolution of Burger's equation with excellent results. We additionally demonstrate the capability to perform computations in the compressed form through a tensor network periodic convolution that can be orders of magnitude faster than using fast Fourier transforms and the convolution theorem. In addition to being an attractive method for working with data sets generated by existing computers, the tensor network methods utilised are directly translatable to the emerging paradigm of quantum computing.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.18389

Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation

作者:

Jan Cegin ↗Daniil Gurgurov ↗Yusser Al Ghussin ↗Simon Ostermann ↗

Large language models (LLMs) have become an effective tool for synthetic data generation, including for low-resource languages, where generated data can improve downstream task performance. Current best-performing approaches typically rely on few-shot prompting with target-language examples, which increases inference costs and may reduce diversity through lexical anchoring. In this work, we investigate activation steering as an alternative for low-resource synthetic data generation. We study two steering strategies: Language Steering, which targets the linguistic identity of a language, and Quality Steering, which captures well-formedness by contrasting human-written and backtranslated text representations. We evaluate these methods across four open-source LLMs, multiple layers, and 11 typologically diverse languages by generating sentiment and topic classification data and finetuning smaller classifiers. Steering is applied in both zero-shot and few-shot prompting settings and compared against non-steered counterparts. Our results show that steering on early layers consistently improves the diversity of generated data while often yielding stronger downstream model performance, particularly for low-resource languages.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络