论文广场 - AcademicHub

01.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.18598

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

作者:

Anna C. Edmonds ↗Mansur M. Arief ↗Robert J. Moss ↗Mykel J. Kochenderfer ↗Jef Caers ↗

arXiv:2606.18598v1 Announce Type: new Abstract: Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

阅读与讨论 → 访问原文 →

02.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2606.11843

Quantum iterative approach to the Traveling Salesman Problem

作者:

Arturo Rodr\'iguez-Almaz\'an ↗Guillermo Rivas ↗Ricardo S. Alonso ↗Daniela Falc\'o ↗Mir Amir Hosseini ↗

arXiv:2606.11843v1 Announce Type: new Abstract: The Traveling Salesman Problem (TSP) is a classical NP-hard problem in combinatorial optimization, where determining the shortest route among a set of cities becomes computationally prohibitive as the problem size increases. This work explores quantum computing as an alternative approach to address this complexity. Unlike existing methods that primarily rely on quantum annealing, we propose a quantum iterative framework integrating Quantum Phase Estimation (QPE) and Grover's search algorithm. Route costs are encoded as quantum phases, enabling QPE to efficiently evaluate them, while Amplitude Amplification, implemented via the Grover-Long algorithm, iteratively refines the solution space toward the optimal route. A proof-of-concept case study on a small-scale TSP instance demonstrates the feasibility of this approach and its potential for scaling to larger optimization problems. Furthermore, under an expectation-based analysis, the algorithm exhibits an expected computational complexity of $O(\frac{m^2\log_2(m)\log_2(1/\epsilon)}{\sqrt{\epsilon}})$ which depends on the error tolerance parameter $\epsilon$. This estimation omits the initialization term, which we expect future refinements to render subdominant to Phase Estimation.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2606.17520

GASE: Gaussian Splatting-Based Automated System for Reconstructing Embodied-Simulation Environments

作者:

Jiawei Zhang ↗Yiming Yan ↗Chao Liang ↗Nuo Xu ↗Seson Sun ↗Qichen Zhang ↗Yuhao Xu ↗Yantai Yang ↗Yingqiao Wang ↗Qin Jin ↗Zhipeng Zhang ↗

Training embodied agents in the real world requires skilled operators and expensive hardware. Simulation environments offer a compelling alternative by enabling large-scale, cost-effective data augmentation. Consequently, rapidly constructing high-fidelity simulation scenes with a minimal sim-to-real gap has become a critical objective in robot learning. While reconstruction-based methods provide superior visual quality, current workflows are hindered by inefficient data acquisition and subpar foreground object extraction. We thus propose GASE, a highly automated system for simulation scene construction. GASE leverages multi-view video streams from panoramic camera arrays to enable rapid environment scanning. To ensure high-quality asset generation, our pipeline introduces a camera-pose-based strategy that robustly extracts objects across frames in the 2D domain, followed by high-fidelity scene inpainting. Foreground objects and the static background are then reconstructed independently and seamlessly imported into physics simulators for policy training. Extensive experiments demonstrate that GASE outperforms existing 3D Gaussian-based methods in segmentation accuracy by over 10\% while achieving state-of-the-art inpainting quality. Furthermore, real-robot deployments across manipulation and navigation tasks maintains a performance gap of less than 10\% compared to policies trained purely on real-world data. These results confirm that GASE provides an efficient and highly effective solution for bridging the sim-to-real gap. Code will be released.

阅读与讨论 → 访问原文 →

04.

medRxiv (Medicine) 2026-06-15 DOI: HASH:6037ee13009de8596f815533feae4a59

Evaluation of AI-Generated Synthetic Data for Clinical Research in Secondary Cardiovascular Prevention among Dyslipidemia Patients

作者:

Bonomi ↗Werba ↗J. P ↗Saccani ↗Lu ↗L. L ↗Coser ↗Franchi ↗Valsecchi ↗Teruzzi ↗Terragni ↗Centenaro ↗…

Background: Access to high-quality clinical data is essential for advancing medical research and developing effective medical statistical and Artificial Intelligence models. However, privacy regulations and logistical barriers often hinder timely access to real-world data. Synthetic data offer a promising solution, preserving the statistical characteristics of original datasets while protecting patient privacy. Objectives: This study investigates the use of synthetic data for secondary cardiovascular prevention in patients with dyslipidemia, using two real-world datasets from Centro Cardiologico Monzino. Methods: Given the high dimensionality and limited sample size of the datasets, we employed a custom generative framework based on Large Language Models (LLMs). Pre-trained LLMs were fine-tuned on original clinical records to synthesize tabular data replicating source-data distributions. Fine-tuning was performed within the Centro Cardiologico Monzino's secure infrastructure to ensure data sovereignty. We evaluate clinical utility and privacy using fidelity and privacy metrics, identifying the optimal generative model and benchmarking against traditional anonymization methods. Results: Synthetic data achieved a superior trade-off than classically anonymized datasets. Real and synthetic datasets showed strong agreement, with significant distributional differences limited to few variables. Models trained on synthetic data replicated key associations from the original dataset, including therapy modification and creatine phosphokinase as predictors of SAMS, and pharmacological intensity as the main driver of LDL-C reduction. Conclusions: Results support the feasibility of using synthetic data as a proxy for real-world datasets in exploratory analyses and model development. Despite slight attenuation of some effect sizes, preserved clinical relationships reinforce the validity of synthetic data in medical research.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.19308

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

作者:

Leyang Shen ↗Yang Zhang ↗Xiaoyan Zhao ↗Chun Kai Ling ↗Tat-Seng Chua ↗

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tasks require simultaneous reasoning from the stances of all involved stakeholders whose decisions are mutually dependent and thus cannot be solved in isolation. We characterize this challenge as stance entanglement, a form of decision complexity distinct from execution complexity. To address it, we propose Multi-Agent Fictitious Play (MAFP), a novel MAS paradigm that represents stakeholder stances as agents and formulates decision-making as an equilibrium-seeking process. Built on the game-theoretic principle of fictitious play, MAFP iteratively updates each agent's decision by best responding to the empirical mixture of other agents' past decisions. This enables agents to expose and address one another's weaknesses, progressively improving decision quality and robustness. We evaluate MAFP on challenging decision-making tasks that test the capability of deciding strategies for competitive scenarios prior to acting. MAFP outperforms both single-round and multi-round baselines on two complementary metrics, tournament strength and robustness, demonstrating its effectiveness in addressing stance entanglement.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19382

DynAMO:Dynamic Asset Management Orchestration via Topological Multi-Agent Scheduling

作者:

Kanishk Kushwaha ↗Vikrant Vinod Bansode ↗Harsh Vardhan ↗Dhaval C. Patel ↗

arXiv:2606.19382v1 Announce Type: cross Abstract: While LLM-powered agents offer end-to-end automation for industrial asset lifecycles, real-world Industry 4.0 deployment is hindered by latency, concurrency instability, and safety risks. We present DynAMO (Dynamic Asset Management Orchestration), a deployment-ready engine using a Plan-then-Execute architecture to generate verifiable workflow graphs. DynAMO supports both SequentialWorkflow (topological execution) and ParallelWorkflow (dependency-aware concurrency). By dynamically identifying independent tasks, DynAMO preserves structural correctness and safety while significantly improving efficiency through controlled reasoning overlap. Across six controlled experiments on the AssetOpsBench industrial benchmark, DynAMO demonstrates substantial performance and robustness gains. Parallel execution reduces end-to-end latency by a median of 1.6x over sequential orchestration, rising to 1.8x on highly parallelizable workflows. After instrumenting external tool calls with realistic latencies, a latency decomposition shows that LLM reasoning and orchestration still account for more than 90% of execution time, identifying model inference as the primary system bottleneck. Structured context pruning reduces inference latency by approximately 30%, and DynAMO maintains correct functional behaviour (task completion, agent sequencing, and output quality) while exhibiting graceful degradation under controlled fault injection. Reproducibility analysis further confirms stable execution under repeated runs, with parallel scheduling reducing latency variance. These findings establish DynAMO as a practical blueprint for scalable, safe, and latency-aware agent deployment in Industry 4.0 automation pipelines. Code is available at: https://github.com/kushwaha001/DynAMO

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.11429

Gumbel-BEARD: Automatic Layer Selection for Self-Supervised Adaptation of Whisper in Low-Resource Domains

作者:

Zilai Wang ↗Natarajan Balaji Shankar ↗Mohan Shi ↗Kaiyuan Zhang ↗Abeer Alwan ↗

Speech foundation models often struggle in low-resource domains due to domain mismatch and data scarcity. We propose Gumbel-BEARD, a domain adaptation framework that automates Whisper encoder layer selection via an end-to-end trainable hard Gumbel-Softmax selector. It enables self-supervised adaptation with a BEST-RQ objective that dynamically adapts to target acoustic characteristics without manual tuning. Experiments on the MyST child speech corpus demonstrate efficiency and scalability: with 10 h of labeled data for fine-tuning, our method matches a fully supervised baseline trained on the complete 133 h labeled set. We establish new state-of-the-art word error rates (WERs) of 8.21% using Whisper-medium on MyST and 11.06% using Whisper-small on the OGI Spontaneous dataset. Evaluation on CORAAL further confirms robustness to adult dialectal domain shifts, with up to 6% relative WER reduction, highlighting the generalizability of our approach to diverse low-resource conditions.

阅读与讨论 → 访问原文 →

08.

Nature (Science) 2026-06-09 DOI: HASH:7690c7552fa1911fb743393738a084f3

A unicellular relative links aggregative multicellularity to animal origins

作者:

Ruibao Li ↗

How animals evolved complex multicellularity from their unicellular ancestors remains unanswered. Unicellular relatives of animals exhibit simple multicellularity through clonal division, formation of multinucleate coenocytes, or aggregation. 1 Therefore, animal multicellularity may have evolved from one (or a combination) of these behaviours. Aggregation has classically been dismissed as a means to complex multicellularity. 2 However, aggregation occurs in many extant animal cells and has also been recently described in three close unicellular relatives of animals (the choanoflagellates Salpingoeca rosetta and Choanoeca flexa, and the filasterean Capsaspora owczarzaki). 3-5 It is unclear whether aggregation in these species is derived or ancestral, and its relevance for animal origins remains unknown. To fill this gap, we investigated whether an additional close unicellular relative of animals can undergo aggregation. We discovered that the marine free-living bacterivorous filasterean Ministeria vibrans 6 forms homogeneous aggregates with reproducible kinetics that have long-term stability, and that improved feeding and mating may be evolutionary drivers of this aggregation. Notably, we found that homologs of many animal multicellularity genes involved in cell adhesion, signalling, and transcriptional regulation were deployed during the aggregation process, indicating that they may have been used for aggregation in the unicellular ancestors of animals before being co-opted into animal multicellular development. Thus, our results imply that aggregative multicellularity was key to the development of the multicellular animal genetic toolkit.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2606.12633

ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

作者:

Jiangtao Kong ↗Peijun Zhao ↗Chun-Fu Chen ↗Youngwook Do ↗Shaohan Hu ↗Tianyi Zhou ↗Huajie Shao ↗

Incremental Learning (IL) for Open-ended Image-to-Text Generation (OpenITG) enables models to continuously generate accurate, contextually relevant text for new images while preserving previously acquired knowledge. Unlike prior studies, this paper addresses a more practical scenario in which the predominant category of visual data shifts over time as environments evolve. In this context, we introduce a new notion of continual alignment, which incrementally adapts the alignment module within pre-trained VLMs to preserve high-quality cross-modal representations. Based on this idea, we propose Efficient Continual Alignment (ECA), a novel exemplar-free IL approach for OpenITG. The key challenge is enabling the model to acquire new, task-specific features while minimizing interference with the established alignment without accessing raw data from previous tasks. To address this, ECA employs three core mechanisms: a Mixture of Query (MoQ) module that adapts task-specific query tokens, a Fisher Dynamic Expansion (FeDEx) that dynamically expands model structure based on a Fisher Information Matrix (FIM)-based metric, and an embedding dictionary with Dictionary Replay (DR) to retain past knowledge. To evaluate ECA's performance, we construct four new IL OpenITG benchmarks that better reflect real-world scenarios. Experimental results demonstrate that ECA significantly mitigates catastrophic forgetting and improves IL performance compared to baseline methods. Code and benchmarks are available at https://github.com/Snowball0823/ECA.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.20532

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

作者:

Nityanand Mathur ↗Hamees Sayed ↗Wasim Madha ↗Apoorv Singh ↗Sameer Khurana ↗Akshat Mandloi ↗Sudarshan Kamath ↗

arXiv:2606.20532v1 Announce Type: new Abstract: Style-captioned text-to-speech systems use natural language to control voice characteristics, but how individual words influence acoustic output remains unclear. Understanding this is critical for diagnosing failure modes and improving controllability in expressive TTS. We propose cross-attention attribution for speech diffusion models, adapting the DAAM framework to the speech domain for the first time, and apply it to CapSpeech-TTS. Our method extracts per-token heatmaps across 25 layers and 24 ODE steps. We analyze 3,600 (style caption, text transcript) combinations comprising 120 style captions conditioning the generation of 30 text transcripts each, revealing how caption tokens shape waveforms. Results show: (1) style tokens have lower temporal variance than content/function tokens, confirming global conditioning; (2) style attention correlates with F0 and energy; (3) style conditioning peaks in early steps and deep layers; (4) attention entropy reaches its minimum at layer 17, co-occurring with the style importance peak, indicating maximal network selectivity at the most style-critical stage. This is the first study of how natural language influences cross-attention in speech diffusion models

阅读与讨论 → 访问原文 →

11.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15099

Think Less, Act Early: Reinforced Latent Reasoning with Early Exit in Vision-Language-Action Models

作者:

Dianqiao Lei ↗Lianlei Shan ↗

Existing Vision-Language-Action (VLA) models predominantly rely on explicit Chain-of-Thought (CoT) reasoning to bridge perception and action. While effective, this paradigm suffers from high computational costs and error propagation in multi-step tasks. In this paper, we propose Adaptive Variable Alignment VLA (AVA-VLA), a novel Latent Reasoning VLA framework that models reasoning as a sequence of unobservable latent variables, bypassing the need for explicit text generation. However, latent trajectories are inherently susceptible to noise interference and misalignment with downstream objectives. To address this, we introduce a Reinforcement Learning-based Denoising mechanism that treats latent state generation as a sequential decision process, optimizing reasoning trajectories via task-level rewards. Furthermore, we incorporate an Early-Exit Strategy that adaptively terminates reasoning based on state confidence, enabling a dynamic trade-off between depth and efficiency. Extensive experiments on embodied decision benchmarks demonstrate that AVA-VLA achieves a 6x inference speedup over explicit CoT methods while attaining a 98.3% average success rate on LIBERO, improving both efficiency and long-horizon stability over full-reasoning baselines.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.01697

RCEM: Robust Conversational Search EMbedder in Distributional Shift

作者:

Kilho Son ↗Paul Hsu ↗Cha Zhang ↗Dinei Florencio ↗

We propose RCEM, a Robust Conversational search EMbedder that is additionally equipped with LLM's query reformulation capability without losing base model's generalization. Unlike prior conversational dense retrieval approaches that learn direct conversation-to-passage matching, RCEM aligns conversations, prepended by special token, to LLM-rewritten queries, while preserving the original embedding space. The unchanged embedding space automatically maps the rewritten-query to the relevant passages. As a result, RCEM (1) reduces overfitting by simplifying the alignment task from long passages to shorter rewritten queries, (2) eliminates the need for conversation-to-passage relevance labels for training, and (3) maintains its original embedding space that allows conversational queries against indexes built by original embedder without rebuilding them. Extensive experiments show that RCEM consistently outperforms prior approaches, achieving up to 30% improvement under distributional shift.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2605.22183

Action with Visual Primitives

作者:

Weilong Guo ↗Yuchen Wang ↗Renping Zhou ↗Yunfeng Zhang ↗Rui Fang ↗Yuyang Pang ↗Wenda Xu ↗Gao Huang ↗

arXiv:2605.22183v3 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for generalist robotic manipulation. A common design in current architectures maps language instructions and visual observations to actions in a single forward pass. While conceptually simple, this formulation entangles instruction comprehension, spatial scene understanding, and motor control within a single learning objective. As a result, the action expert must implicitly relearn cognitive and perceptual capabilities already present in the pretrained VLM, which can limit both learning efficiency and generalization. We introduce AVP (Action with Visual Primitives), an end-to-end architecture that implements this visual-primitive-centric interface: the VLM infers the next-stage target and emits visual-primitive tokens that condition a flow-matching action expert, with supervision derived from end-effector kinematics. Real-robot experiments on general pick-and-place tasks show that AVP improves the success rate by 37.04% over pi_0.5 and outperforms other recent methods, with consistent gains in data efficiency, spatial-compositional generalization, and object-level transfer.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.17037

The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

作者:

Alper Y{\i}ld{\i}r{\i}m ↗

Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside their hidden layers, and we test it causally: given two images, we transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows. In PRISM2D, GFNet, and ViT-B/16 the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy, so identity rides on phase while image-specific magnitude is largely dispensable to the readout. ResNet-50 at first seems to break the pattern, because transplanting sign after its ReLUs does nothing; a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry, which gives a mechanistic account of the texture–shape gap between CNNs and attention models.

阅读与讨论 → 访问原文 →

15.

Science (Express) 2026-06-02 DOI: 10.1126/science.aej3572

Another red alert for American science | Science

作者: 未知作者

Although research has bipartisan support in the US Congress, and trust in science is above 75% across the country, the Trump administration seems as determined as ever to mortally wound the nation’s scientific enterprise. After the scientific community persuaded Congress to restore most of the president’s draconian cuts to research funding last year, the White House Office of Management and Budget (OMB), under Russell Vought, has found new ways to circumvent the will of Congress and starve American science. At the beginning of this year, OMB dragged its feet in releasing instructions to federal agencies for how to distribute the funding appropriated by Congress, leading to lags in dispersal. Now, OMB has proposed revising the rules that govern how federal dollars are spent. The changes would inevitably lead to unlegislated reductions in funding and damage US leadership in science, both in academia and industry.

阅读与讨论 → 访问原文 →

16.

Nature (Science) 2026-06-10 DOI: HASH:33fc574a78c3bea7dd345429847915d4

The Amazon can be saved — with concerted action inside and outside Brazil

作者: 未知作者

As deforestation in the Amazon falls, fresh evidence shows that the rainforest can withstand global warming, but only if there is a worldwide effort to stop cutting it down. As deforestation in the Amazon falls, fresh evidence shows that the rainforest can withstand global warming, but only if there is a worldwide effort to stop cutting it down.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2507.10599

Emergence of Hierarchical Emotion Organization in Large Language Models

作者:

Maya Okawa ↗Bo Zhao ↗Eric J. Bigelow ↗Rose Yu ↗Tomer Ullman ↗Ekdeep Singh Lubana ↗Hidenori Tanaka ↗

As large language models (LLMs) increasingly power conversational agents, understanding how they model users' emotional states is critical for ethical deployment. Inspired by emotion wheels, i.e., a psychological framework that argues emotions organize hierarchically, we analyze probabilistic dependencies between emotional states in model outputs. We find that LLMs naturally form hierarchical emotion trees that align with human psychological models, and larger models develop more complex hierarchies. We also uncover systematic biases in emotion recognition across socioeconomic personas, with compounding misclassifications for intersectional, underrepresented groups. Human studies reveal striking parallels, suggesting that LLMs internalize aspects of social perception. Beyond highlighting emergent emotional reasoning in LLMs, our results hint at the potential of using cognitively-grounded theories for developing better model evaluations.

阅读与讨论 → 访问原文 →

18.

arXiv (math.PR) 2026-06-19 DOI: arXiv:2406.11783

The systole of random hyperbolic 3-manifolds

作者:

Anna Roig-Sanchis ↗

arXiv:2406.11783v2 Announce Type: replace-cross Abstract: We study the systole of a model of random hyperbolic 3-manifolds introduced by Petri and Raimbault, answering a question posed in that same article. These are compact manifolds with boundary constructed by randomly gluing truncated tetrahedra along their faces. We prove that the limit, as the volume tends to infinity, of the expected value of their systole exists and we give a closed formula of it. Moreover, we compute a numerical approximation of this value.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.CL) 2026-06-15 DOI: arXiv:2606.14325

Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

作者:

Francesco Cazzaro ↗Jessica Lennon ↗Ariadna Quattoni ↗

Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources. To enable precise access to the information contained in them we need conversational interfaces based on Text-To-Cypher (Text2Cypher) parsers. This paper presents an automatic synthetic data generation method that can be leveraged to fine-tune small LLMs for this task. We conduct experiments on all the major Text-To-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models. This means that in settings in which models must be locally deployed we can ensure data-sovereignty without sacrificing accuracy and without costly annotation campaigns.

阅读与讨论 → 访问原文 →

20.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2512.04801

Hybrid VQE-CVQE algorithm using diabatic state preparation

作者:

John P. T. Stenger ↗C. Stephen Hellberg ↗Daniel Gunlycke ↗

arXiv:2512.04801v2 Announce Type: replace Abstract: We propose a hybrid variational quantum algorithm that has variational parameters used by both the quantum circuit and the subsequent classical optimization. Similar to the Variational Quantum Eigensolver (VQE), this algorithm applies a parameterized unitary operator to the qubit register. We generate this operator using diabatic state preparation. The quantum measurement results then inform the classical optimization procedure used by the Cascaded Variational Quantum Eigensolver (CVQE). We demonstrate the algorithm on a system of interacting electrons and show how it can be used on long-term error-corrected as well as short-term intermediate-scale quantum computers. Our simulations performed on IBM Brisbane produced energies well within chemical accuracy.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.20084

Residual-Space Evolutionary Optimization via Flow-based Generative Models

作者:

Zhuo Cao ↗Lena Krieger ↗Fernanda Nader ↗Xuan Zhao ↗Hanno Scharr ↗Ira Assent ↗

arXiv:2606.20084v1 Announce Type: new Abstract: Data editing with generative methods typically requires differentiable objectives and gradient-based search. However, these assumptions break down in flow-based settings, where edits are performed through forward and backward integration and often involve non-differentiable or black-box objectives. We introduce residual-space evolutionary optimization, a model-agnostic framework that addresses this gap by combining flow-based generative editing with evolutionary algorithms. Building on the observation that conditional flow matching (CFM) can disentangle condition-controlled factors from instance-specific residuals, our framework directly operates in residual space and separates two complementary search regimes: self-pollination performs local exploitation through feature-preserving residual refinement, and cross-pollination promotes broader exploration by recombining residuals across heterogeneous samples. As a proof of concept, we validate on MorphoMNIST, a benchmark dataset for counterfactual generation, and on crystal data, demonstrating that this exploration–exploitation decomposition provides a useful mechanism for balancing target alignment, instance preservation, and diversity, and extends beyond images to real-world scientific domains.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16432

ACCORD: Action-Conditioned Contextual Grounding for Language Agents

作者:

Lai Jiang ↗Cheng Qian ↗Zhenhailong Wang ↗Pan Lu ↗Heng Ji ↗Hao Peng ↗

User instructions are often underspecified because humans rely on implicit assumptions about the surrounding environment. For large language model (LLM) agents operating in information-rich digital and physical environments, these assumptions cannot be inferred from the instruction alone; they must be recovered from the current state of tools, data, interfaces, and observations. Effective execution therefore requires agents to identify missing context, ground it in observed evidence, and carry it forward into subsequent actions. We show that current agents often fail to do so. They act from assumed rather than observed specifics, overlook information they could have gathered, and fail to incorporate evidence that has already been returned. Building on this insight, we propose ACCORD (Action-Conditioned Contextual Grounding), a simple and effective agent framework for adaptive grounding. Before each action, ACCORD actively probes the environment for missing information and integrates relevant context from the agent's trajectory that would otherwise be overlooked. Requiring no additional training or task-success signals, ACCORD improves task-goal completion on AppWorld by up to +20.6 points with GPT-5-mini, from 42.0% to 62.6%, compared to strong baselines. These gains persist with a substantially stronger base model (+10.8 with Claude-4.5-sonnet), an open-weight model (+10.1 with Qwen3.5-27B-FP8), and on the embodied AlfWorld benchmark (+7.4 success rate with GPT-5-mini).

阅读与讨论 → 访问原文 →

23.

arXiv (quant-ph) 2026-06-12 DOI: arXiv:2606.12766

Asymmetric quantum steering harvested near a Lorentz-violating BTZ black hole

作者:

Si-Yu Liu ↗Xin-Ze Song ↗Xiang-Yue Yu ↗Wentao Liu ↗Xiao-Li Huang ↗Shu-Min Wu ↗

arXiv:2606.12766v1 Announce Type: cross Abstract: We investigate the harvesting of quantum steering and its directional asymmetry between two Unruh-DeWitt detectors in a Lorentz-violating BTZ black hole spacetime. Since the detectors are located at different radial positions outside the black hole, they experience inequivalent local environments induced by gravitational redshift, causing Alice to undergo stronger effective thermal noise than Bob. Remarkably, we uncover a counterintuitive phenomenon in which the detector subjected to a higher effective temperature exhibits stronger steerability than the other one, revealing a nontrivial inversion of thermal intuition in curved spacetime. Furthermore, quantum steering survives only within a finite window of detector energy gaps and reaches its maximum within an optimal regime. We find that Lorentz violation suppresses steering most strongly near this optimal energy gap, indicating an enhanced sensitivity of maximal correlation extraction to symmetry breaking effects. Our results demonstrate that Lorentz violation acts as a geometric constraint on the quantum information capacity of spacetime, simultaneously restricting both the strength and the directionality of quantum correlations.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2604.20348

Bimanual Robot Manipulation via Multi-Agent In-Context Learning

作者:

Alessio Palma ↗Indro Spinelli ↗Vignesh Prasad ↗Luca Scofano ↗Yufeng Jin ↗Georgia Chalvatzaki ↗Fabio Galasso ↗

arXiv:2604.20348v2 Announce Type: replace-cross Abstract: Language Models (LLMs) have emerged as powerful reasoning engines for embodied control. In particular, In-Context Learning (ICL) enables off-the-shelf, text-only LLMs to predict robot actions without any task-specific training while preserving their generalization capabilities. Applying ICL to bimanual manipulation remains challenging as the high-dimensional joint action space and tight inter-arm coordination constraints rapidly overwhelm standard context windows. To address this, we introduce BiCICLe (Bimanual Coordinated In-Context Learning), the first framework that enables standard LLMs to perform few-shot bimanual manipulation without fine-tuning. BiCICLe frames bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. Evaluated on 13 tasks from the TWIN benchmark, BiCICLe achieves 70.5% average success rate, outperforming the best training-free baseline by 6.1 percentage points and surpassing most supervised methods. We also demonstrate superior real-world performance on 3 tasks without hardware-specific retraining.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19728

Bidirectional Tutoring for Developmental Motor Learning in Robots: Co-Developed Interaction Dynamics Support Stable Learning

作者:

Rui Fukushima ↗Jun Tani ↗

arXiv:2606.19728v1 Announce Type: cross Abstract: Infants are well known to develop their motor skills through dense interaction with caregivers. Although such social interaction is crucial for human development, motor-skill learning in robots is often treated as a unidirectional process in which robots passively receive demonstrations from tutors. This overlooks a key property of social interaction: it is inherently bidirectional, with tutor and learner dynamically adapting to each other. In such interactions, the robot's past experiences may function as prior constraints that shape the dynamics of their co-developed trajectories. We hypothesize that bidirectional tutoring allows such constraints to guide the formation of consistent behavioral patterns that preserve behavioral coherence and support generalization, whereas unidirectional interaction lacks such constraints and leads to broader, less consistent behavioral patterns. To examine this hypothesis, we conducted two experiments with a physical humanoid robot performing an object manipulation task: one involving human-robot interaction and another employing an AI tutor interacting with the real robot through an adaptive intervention mechanism designed to examine whether similar effects would emerge under more controlled conditions. We implement the developmental learning framework using a free-energy-principle-based neural network extended with generative replay, which supports stable sequence-by-sequence learning from single tutored episodes. Across both settings, bidirectional tutoring fostered consistent behaviors and stage-wise generalization, while the robot gradually required less tutor guidance. These results suggest that bidirectional tutoring, as an embodied and socially grounded approach, provides an effective scaffold for developmental motor learning in robots.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络