Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-11

Quantum thermodynamics, quantum correlations and quantum coherence in accelerating Unruh-DeWitt detectors in both steady and dynamical state

arXiv:2512.18123v2 Announce Type: replace Abstract: We investigate the interplay between quantum thermodynamics, quantum correlations, and quantum coherence within the framework of the Unruh-DeWitt (UdW) detector model. By analyzing both the steady and dynamical states of various quantum resources (including steerability, entanglement, quantum discord, and coherence), we study how these resources evolve under Markovian and non-Markovian environments. Furthermore, we investigate the impact of both the Unruh temperature and the energy levels on three key quantum phenomena: thermodynamic evolution, quantum correlations, and quantum coherence, considering different initial state preparations. The hierarchical structure relating quantum correlations and quantum coherence is determined. We further examine the thermodynamic performance of a quantum heat engine, highlighting the influence of memory effects and classical correlations on heat exchange, work extraction, and efficiency. Our results reveal that non-Markovian dynamics can enhance the preservation of quantum correlations and improve the engine's efficiency compared to purely Markovian regime. These findings provide insights into the role of quantum correlations and quantum coherence in quantum thermodynamic processes and open avenues for optimizing quantum devices operating in relativistic or open-system settings.

02.
medRxiv (Medicine) 2026-06-15

Genome-wide colocalization of body fat distribution GWAS and subcutaneous adipose eQTLs identifies SNX10, DGKQ, and CBX3 as candidate causal genes for cardiometabolic disease

作者:

Background: Genome-wide association studies (GWAS) have identified hundreds of loci associated with body fat distribution, yet the causal genes and regulatory mechanisms through which these variants exert their effects remain largely unknown. Expression quantitative trait locus (eQTL) colocalization provides a powerful framework for identifying genes whose expression is genetically coregulated with complex traits. Methods: We performed a genome-wide colocalization analysis integrating waist-hip ratio adjusted for body mass index (WHRadjBMI) GWAS summary statistics from 694,649 individuals (Pulit et al., 2019) with subcutaneous adipose tissue eQTLs from the Genotype-Tissue Expression (GTEx) Project v8 (N = 581 donors). GWAS coordinates were lifted from GRCh37 to GRCh38 to enable direct alignment with GTEx data. We incorporated CAVIAR fine-mapping results to overcome the limitation of FDR-significant eQTL filtering. Colocalization was assessed using the approximate Bayes factor framework (coloc.abf) across 335 independent genome-wide significant loci. Results: Of 2,897 locus-gene pairs tested, 489 (16.9%) showed strong colocalization (PP.H4 > 0.8) and 618 (21.3%) showed moderate evidence (PP.H4 > 0.5). The strongest colocalization was observed for SNX10 (sorting nexin 10; PP.H4 = 1.000), a recently characterized regulator of adipocyte differentiation and female-specific diet-induced obesity. Other top hits included DGKQ (diacylglycerol kinase theta; PP.H4 = 0.9999999), an emerging pharmacological target for insulin resistance, and CBX3 (chromobox 3; PP.H4 = 0.9999974), an epigenetic regulator linked to cardiovascular disease. Established adiposity genes including GRB14 (PP.H4 = 0.681) and KLF14 (PP.H4 = 0.590) were recovered, validating our approach. Several loci exhibited extensive allelic heterogeneity, with 50 genes colocalizing at a single chromosome 3 locus. Conclusions: Our analysis provides a comprehensive map of adipose tissue gene regulatory mechanisms underlying genetic risk for body fat distribution. The identification of SNX10, DGKQ, and CBX3 as high-confidence candidate causal genes advances the translation of GWAS associations into mechanistic understanding and therapeutic targets for obesity-related cardiometabolic disease.

03.
medRxiv (Medicine) 2026-06-17

The Unreliable Judges: Assessing Reproducibility and Self-Preference Bias of LLMs as Free-Text Evaluators

Large Language Models (LLMs) are transforming clinical practice and research, but their adoption requires rigorous evaluation. While human assessment is ideal, its cost has driven the widespread use of LLMs as evaluators. We introduce an open-source reciprocal framework comparing 71 human experts against six LLMs. AI evaluators show a strong self-preference bias, yet neither group reliably identified whether a response was human- or AI-generated. AI scores correlated with surface features such as length and lexical diversity, whereas human scores did not. By probing the evaluator's hidden states and applying targeted steering, we show that verbosity is a major causal driver of the bias. Moreover, shuffling question-response pairings shows that long responses keep high scores even when they no longer answer the question, whereas short ones do not, demonstrating that AI judges reward verbosity largely independently of content alignment. Finally, API-based and batch inference inflate stochasticity, underscoring the need for controlled deployment.

04.
arXiv (CS.CV) 2026-06-18

Native Active Perception as Reasoning for Omni-Modal Understanding

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their context cost still scales with video length. We propose OmniAgent, the first native omni-modal agent that formulates video understanding as a POMDP-based iterative Observation-Thought-Action cycle. OmniAgent executes on-demand actions to selectively distill audio-visual cues into a persistent textual memory, effectively decoupling reasoning complexity from raw video duration. To operationalize this, we introduce (1) Agentic Supervised Fine-Tuning to bootstrap native active perception via best-of-N trajectory synthesis with dual-stage quality control, and (2) Agentic Reinforcement Learning with TAURA (Turn-aware Adaptive Uncertainty Rescaled Advantage), which leverages turn-level entropy to steer credit assignment toward pivotal discovery turns. Crucially, OmniAgent exhibits positive test-time scaling, where performance improves as the number of reasoning turns increases, validating the efficacy of active perception. Empirical results across ten benchmarks (e.g., VideoMME, LVBench) demonstrate that OmniAgent achieves state-of-the-art performance among open-source models. Notably, on LVBench, our 7B agent outperforms the 10$\times$ larger Qwen2.5-VL-72B (50.5% vs. 47.3%).

05.
arXiv (quant-ph) 2026-06-19

Faking entanglement with imperceptible measurement deviations

arXiv:2606.20396v1 Announce Type: new Abstract: Quantum entanglement is a central resource underpinning emerging quantum technologies, enabling capabilities beyond those of classical systems. Accurate verification of entanglement is therefore crucial. However, experimental schemes usually rely on the assumption that quantum measurements can be realized exactly. As the complexity of a quantum system grows, this assumption typically becomes increasingly unrealistic, therefore leading to a widening mismatch between theoretical models and experimental implementations. Here we demonstrate that arbitrarily small measurement errors, when adversarially encoded in the measurement apparatus, can lead to the false certification of high-dimensional entanglement in systems that are, in fact, separable. This is achieved by introducing explicit hacking attacks to measurement devices in well-established entanglement verification tests. We further experimentally demonstrate this effect using classical photonic states encoded in the spatial degree of freedom, spanning up to 61 dimensions with measurement fidelity errors as low as 0.23%. Our results uncover a fundamental vulnerability in current methods for high-dimensional entanglement detection, highlighting the susceptibility of complex quantum devices to small adversarial perturbations. The findings underscore the need for developing secure verification of quantum information that is robust to bounded discrepancies between theory and experiment.

06.
arXiv (CS.AI) 2026-06-15

PRISM: Perception Reasoning Interleaved for Sequential Decision Making

arXiv:2605.05407v2 Announce Type: replace Abstract: Scaling LLM-based embodied agents from text-only environments to complex multimodal settings remains a major challenge. Recent work identifies a perception-reasoning-decision gap in standalone Vision-Language Models (VLMs), which often overlook task-critical information. In this paper, we introduce PRISM, a framework that tightly couples perception (VLM) and decision (LLM) through a dynamic question-answer (DQA) pipeline. Instead of passively accepting the VLM's description, the LLM critiques it, probes the VLM with goal-oriented questions, and synthesizes a compact image description. This closed-loop interaction yields a sharp, task-driven understanding of the scene. We evaluate PRISM on the ALFWorld and Room-to-Room (R2R) benchmarks. We show that: (1) PRISM significantly outperforms state-of-the-art image-based models, (2) our Interactive goal-oriented perception pipeline yields systematic and substantial gains, and (3) PRISM is fully automatic, eliminating the need for handcrafted questions or answers.

07.
arXiv (CS.CV) 2026-06-16

Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation

Generative video models have achieved remarkable visual fidelity and temporal coherence, yet intentional camera control remains elusive. Existing frameworks treat camera motion as a byproduct of pixel synthesis, producing trajectories that are stochastic, spatially inconsistent, and indifferent to the human subject driving the scene. In this work, we present Auteur, a method for language-driven, human-centric camera framing in generative video. Our core insight is that professional filmmakers conceive shots not as world-space trajectories but as framings defined relative to the actor, encoding shot size, angle, and composition as functions of human pose and motion. We formalize this intuition as a human-centric camera parameterization and introduce a Domain-Specific Language (DSL) that is convertible to standard 6-DoF camera parameters. A fine-tuned multimodal large language model then acts as a virtual director, mapping natural language descriptions and coarse human motion to sparse DSL keyframes that are deterministically interpolated into continuous camera trajectories, which are then provided as input to video generators. We train and evaluate Auteur on a new dataset of 34K aligned text, human motion, and DSL-annotated camera trajectories drawn from procedural synthesis and real-world movie footage from the CondensedMovies dataset. Auteur enables cinematographic framing of human-centered scenes, a capability largely absent in prior generative models. To assess this behavior, we propose new framing-focused metrics, and our experiments show that Auteur consistently outperforms existing methods. Project page is https://cyberiada.github.io/Auteur/

08.
arXiv (CS.LG) 2026-06-16

Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models

arXiv:2603.12977v3 Announce Type: replace Abstract: Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of add and delete requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, pointwise identical to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within $10^{-9}$ relative Frobenius error and complete each request at orders-of-magnitude lower cost than federated retraining baselines.

09.
arXiv (CS.AI) 2026-06-11

INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration

arXiv:2606.11440v1 Announce Type: new Abstract: Existing multi-agent LLM orchestration methods, ranging from brute-force ensembles to learned routers, select models and topologies based on task and model features. However, these methods do not consider the runtime state of the serving infrastructure. On shared GPU clusters under concurrent load, this infrastructure blindness causes systematic resource underutilization: preferred models accumulate deep request queues while equally capable alternatives sit idle. In multi-agent pipelines, where each query triggers multiple sequential model calls, these delays then compound across every downstream step. Closing this gap is challenging because the relevant infrastructure signals (queue depths, KV-cache pressure, latencies) are dynamic and noisy, and they must drive three different decisions: planning, per-step routing, and scheduling. We introduce INFRAMIND, a framework that makes the entire multi-agent stack infrastructure-aware. An infra-aware planner conditions topology and role selection on real-time system load and remaining budget, biasing toward simpler graphs under congestion and richer ones at low load. An infra-aware executor then observes per-model queue depths, cache utilization, and response latencies at each agent step to decide which model to call and how deeply to reason; a budget-aware scheduler further reorders each model's queue so that urgent requests are served first. Cast as a hierarchical constrained MDP and solved end-to-end via reinforcement learning, the system learns to balance quality against latency automatically. Across five benchmarks, INFRAMIND delivers up to +7.6 pp accuracy over the prior baseline at low load with up to 7x lower latency, and sustains up to 99.9% SLO compliance under high load where every baseline drops below 50%.

10.
arXiv (CS.AI) 2026-06-15

FPGA-Based Neural Network Accelerators for Space Applications: A Survey

arXiv:2504.16173v3 Announce Type: replace-cross Abstract: Space missions are becoming increasingly ambitious, necessitating high-performance onboard spacecraft computing systems. In response, field-programmable gate arrays (FPGAs) have garnered significant interest due to their flexibility, cost-effectiveness, and radiation tolerance potential. Concurrently, neural networks (NNs) are being recognized for their capability to execute space mission tasks such as autonomous operations, sensor data analysis, and data compression. This survey serves as a valuable resource for researchers aiming to implement FPGA-based NN accelerators in space applications. By analyzing existing literature, identifying trends and gaps, and proposing future research directions, this work highlights the potential of these accelerators to enhance onboard computing systems.

11.
PLOS Computational Biology 2026-06-22

<i>HoloBio</i>: A holographic microscopy tool for quantitative biological analysis

作者:

by Waira Mona, Maria J. Gil-Herrera, Emanuel Mazo, Daniel Córdoba, Sofia Obando-Vasquez, Maria J. Lopera, Rene Restrepo, Carlos Trujillo, Ana Doblas, Raul Castaneda Holographic imaging in microscopy enables label-free quantitative information of biological specimens and has found applications across a wide range of biomedical studies, from cell morphology to particle dynamics; yet its widespread adoption is often limited by the lack of accessible and standardized analysis software. We present HoloBio, an open-source, Python-based graphical user interface developed to address this issue. This software offers two primary operational modes: a Real-Time mode that enables live processing of holograms at video frame rates, and an Offline mode designed for post-processing previously recorded holograms. HoloBio is compatible with holograms recorded using both lens-based and lensless systems, supporting off-axis architectures in telecentric and non-telecentric configurations, as well as slightly off-axis and in-line optical setups. The software incorporates tools for cell tracking, phase profiling, thickness estimation, and morphological analysis, including cell counting and object area quantification. HoloBio is designed to be accessible for users without coding expertise, offering a reproducible, high-throughput environment tailored for researchers in biology, biophotonics, and biomedical imaging.

12.
arXiv (CS.AI) 2026-06-16

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

arXiv:2605.27284v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models are increasingly expected to not only complete robot tasks, but also follow human instructions about how those tasks should be executed. However, existing robot datasets usually pair trajectories with coarse goal-level language, leaving execution-critical details such as active arm, approach direction, and contact region unspecified. This limits steerable policy learning and robotic video understanding. We introduce FineVLA, an open framework for action-aligned fine-grained VLA supervision. The framework includes: (1) a data construction tool that unifies 972,247 trajectories across 85K tasks from 10 open-source robot datasets and builds FineVLA-Data, a human-verified dataset of 47,159 fine-grained trajectories; (2) a held-out benchmark with 500 videos, 11,631 atomic facts, and 1,030 VQA questions; (3) a robotics-specialized VLM annotator for scalable fine-grained annotation; and (4) a steerable VLA policy trained with controlled mixtures of fine-grained and raw goal-level instructions. Our experiments yield three findings. First, fine-grained supervision does not sacrifice goal-level success: FG-only improves over Raw-only by +1.4 to +8.1 success-rate points across settings. Second, fine-grained and raw instructions are complementary, following a consistent inverted-U trend peaking at FG:Raw = 1:2 to 1:1. The best mixed setting reaches 86.8%/82.5% in RoboTwin simulation and 62.7/100 in real-world dual-arm manipulation (vs. 49.9 Raw-only). Third, fine-grained supervision improves steerable control: the largest real-world gains appear on pose (+23), color (+18), and approach direction (+18)–factors where goal-level instructions provide no guidance. Overall, fine-grained language should augment goal-level instructions: specifying how to execute alongside what to achieve. Project page: https://finevla.xlang.ai/

13.
arXiv (CS.CL) 2026-06-16

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Improving the reasoning abilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Looped transformers address this by performing multiple latent iterations to refine each token beyond a single forward pass. However, we identify a latent overthinking phenomenon: most token predictions are already correct after the first pass, but are sometimes revised into errors in later iterations. We ask whether selectively skipping latent iterations can improve accuracy, and reveal significant potential with an oracle iteration policy that boosts performance by up to 7.3%. Motivated by this, we propose Think-at-Hard (TaH), a looped transformer optimized for selective iteration. TaH employs a lightweight neural decider to trigger latent iteration, only at tokens likely to be incorrect after the standard forward pass. During latent iterations, depth-aware Low-Rank Adaptation (LoRA) modules shift the objective from general next-token prediction to focused hard-token refinement. A duo-causal attention mechanism extends attention from the token sequence dimension to an additional iteration depth dimension, enabling cross-iteration information flow with full sequential parallelism. Experiments on nine benchmarks show consistent gains across math, QA, and coding tasks. With identical parameter counts, TaH outperforms always-iterate baselines by 3.8-4.4% while skipping iterations on 93% of tokens, and exceeds single-iteration Qwen3 baselines by 3.0-3.8%. When allowing

14.
arXiv (CS.LG) 2026-06-12

Reliability of Probabilistic Emulation of Physical Systems

arXiv:2606.12997v1 Announce Type: new Abstract: Two dominant approaches have emerged for generating probabilistic forecasts of physical systems: generative models, such as diffusion or flow matching; and ensembles of deterministic models with stochasticity injected, trained using the continuous ranked probability score (CRPS) loss. While both approaches have demonstrated strong predictive accuracy, the reliability of their uncertainties has not been systematically assessed. We address this gap by developing a framework to evaluate both approaches across diverse 2D spatiotemporal physical systems, under matched model size and computational budget. We assess the reliability of probabilistic emulation by inspecting the empirical coverage of predictive intervals, while also considering accuracy and computational efficiency metrics. CRPS-trained ensembles typically achieve more reliable uncertainties on both single-step prediction and autoregressive rollouts, demonstrating better coverage than the standard alternative of training generative models in a latent space. Moreover, the CRPS approach offers significantly faster inference. When generative models are trained in ambient rather than a compressed latent space, which is often infeasible for high-dimensional problems, they exhibit comparable coverage to CRPS-trained ensembles, though with substantially larger inference latency. In contrast, when CRPS-trained ensembles are trained in latent space they do not show a marked degradation in coverage with respect to ambient space. Both generative models and CRPS-trained ensembles demonstrate good predictive accuracy. To facilitate future research and application, we release AutoCast, a modular framework implementing both generative models and CRPS-trained ensembles, alongside AutoSim, a flexible dataset generation package for rapid prototyping.

15.
arXiv (CS.AI) 2026-06-11

Towards Responsibly Non-Compliant Machines

arXiv:2606.12147v1 Announce Type: new Abstract: We consider the problem of engineering autonomous intelligent agents that are capable to responsibly not comply with user requests. We argue that machine non-compliance comes in many different forms, and sketch the issues we should pursue on the road of accomplishing responsibly non-compliant intelligent machines. We anchor responsible non-compliance in justifications for task refusal, pathways to override the non-compliance, as well as careful tracking of security risks and liability transfers.

16.
arXiv (CS.LG) 2026-06-17

Damage Adaptation in Seconds for Architected Materials

arXiv:2606.17394v1 Announce Type: cross Abstract: Adaptation to damages and in-situ physical repairs is essential for long-term robot autonomy, yet challenging outside of narrowly defined and well-anticipated bounds. In this work we proprioceptively adapt to catastrophic damage in soft-actuated systems in under one minute. Architected materials are well equipped for adaptation: actuator failure occurs gradually rather than acutely, and damage can be described in a low-dimensional, discrete coordinate space. Surprisingly, latent damage representations plus a simple yet robust ensemble method is sufficient for adapting to unseen damage in real-time. Moreover, we identify conditions under which exponential sample complexity collapses to linear sample complexity for learned representations of architected materials, a concrete advantage over rigid components or continuum soft mechanisms. We demonstrate LEAP, our method for adaptive proprioception, via a tracing task for a 6DoF soft wrist based on Handed Shearing Auxetic (HSA) actuators. Our algorithm is able to adapt to cuts, burns, and actuator repairs, enabling simulation-free real-time adaptation that is critical for realizing the promise of soft robots outside the lab. Videos and more information are available at https://murpheylab.github.io/leap.

17.
arXiv (CS.CV) 2026-06-16

Geometric Action Model for Robot Policy Learning

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but they still operate primarily on 2D image frames or 2D-derived latent spaces, leaving implicit the 3D geometry required for contact-rich manipulation. We propose the Geometric Action Model (GAM), a language-conditioned manipulation policy that directly repurposes a pretrained geometric foundation model (GFM) as a shared substrate for perception, temporal prediction, and action decoding. GAM splits the GFM at an intermediate layer: the shallow layers serve as an observation encoder, and a causal future predictor inserted at the split layer forecasts future latent tokens conditioned on language, proprioception, and action history. The predicted future tokens are then routed through the remaining GFM blocks for feature propagation and decoding, allowing a single backbone to produce both future geometry and actions. This design equips the GFM with language-conditioned temporal world modeling through minimal architectural modification while preserving its rich geometric priors. Across a broad suite of simulation and real-robot manipulation benchmarks, GAM is more accurate, more robust, faster, and lighter than current foundation-model-scale baselines.

18.
arXiv (math.PR) 2026-06-16

Risk or Replace: Efficient Asymptotics for Data-Driven Maintenance

arXiv:2606.14706v1 Announce Type: cross Abstract: Condition-based maintenance (CBM) is an approach that plans interventions for deteriorating systems according to their observed operational state. CBM reduces unplanned downtime and extends usable lifetime. We study a heterogeneous population of components that degrade over time according to a stochastic processes with non-negative and i.i.d. increments that are characterized by component-specific parameters that remain unobservable to the decision maker. We rely on degradation data to estimate these parameters and determine replacement actions at equidistant epochs. The goal is to minimize the long-run average cost, which incorporates fixed replacement costs, failure costs, and operating costs. This problem can be formulated as a high-dimensional partially observable Markov decision process (POMDP), which is generally intractable. We develop a tractable, data-driven CBM policy that estimates the optimal policy of a hypothetical Oracle that has full information of the underlying degradation parameters and call this policy the Estimated Oracle's Optimal Policy (EOP). We introduce a scaling regime where both the failure thresholds and cost parameters increase proportionally, reflecting practical settings in which component lifetimes and maintenance costs are large relative to the time between two consecutive CBM decision moments. We show that the regret of the EOP, defined as the difference between its long-run average cost and that of the Oracle, converges to zero in the scaling regime when the parameter estimator is consistent. Across extensive experiments using both real and simulated data, the EOP achieves very low regret and, whenever the optimal POMDP policy can be computed exactly, a negligible optimality gap.

19.
arXiv (CS.LG) 2026-06-16

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

arXiv:2507.20424v3 Announce Type: replace Abstract: We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show that DPPF outperforms other communication-efficient approaches and achieves better generalization performance than local gradient methods and synchronous gradient averaging, while maintaining communication efficiency. In addition, our loss landscape visualizations confirm the ability of DPPF to locate flatter minima. On the theoretical side, we show that DPPF guides workers to span flat valleys, with the final valley width governed by the interplay between push and pull strengths, and that its pull-push dynamics is self-stabilizing. We further provide generalization guarantees linked to the valley width and prove convergence in the non-convex setting.

20.
arXiv (CS.CL) 2026-06-16

A Large-Scale Multi-Dimensional Empirical Study of LLMs for Conversation Summarization

Despite the significant advancement of LLMs in conversation summarization, their evaluation remains limited by insufficient scenarios, input lengths, and sample sizes. Furthermore, existing benchmarks often omit frontier reasoning systems and efficient small models, or lack fine-grained, multi-dimensional assessments. To bridge these gaps, we propose OmniCSEval, a unified benchmark comprising 1,800 diverse conversations across six real-world scenarios, featuring context lengths ranging from 128 to 32k tokens. For fine-grained evaluation, we employ a bidirectional fact-checking framework that integrates key fact matching to assess completeness and conciseness, alongside summary fact verification to evaluate faithfulness. To ensure reliable assessment, we establish a human-LLM collaborative pipeline for key fact extraction and a multi-LLM consensus verifier for summary fact decomposition. Leveraging this framework, we evaluate 28 LLMs across four distinct categories grouped by reasoning capability and model scale. Our extensive empirical study reveals critical insights regarding the cross-scenario challenges current LLMs continue to face, the impacts of reasoning and scale, and the efficiency and adaptability of reasoning models. We also provide guidance for system selection in real-world deployments.

21.
arXiv (CS.CV) 2026-06-11

CoCoSI: Collaborative Cognitive Map Construction for Spatial Intelligence

Spatial intelligence is a key frontier for multimodal large language models (MLLMs), enabling them to reason about the physical world from visual experience. Inspired by human spatial cognition, recent approaches construct grid-based cognitive maps from multi-frame visual inputs to maintain coherent spatial representations over time. However, limited context lengths still challenge spatial understanding, while existing methods, such as long-context modeling and external memory, often require architectural changes, memory modules, or finetuning, limiting their applicability to off-the-shelf pretrained MLLMs. This motivates a lightweight, model-agnostic method for preserving spatial information beyond the native context window. To this end, we propose a plug-and-play multi-agent framework that collaboratively constructs cognitive maps as structured spatial memory, enhancing the spatial understanding of arbitrary pretrained MLLMs without architectural modification or additional training. Our framework features local-global agent coordination, cognitive map construction with atomic commits, and cross-agent verification. Extensive experiments demonstrate that our method achieves superior performance on spatial understanding tasks while remaining fully training-free. Code will be released.

22.
arXiv (quant-ph) 2026-06-16

Flux magnetism in a strongly interacting dipolar lattice supersolid under tunable gauge fields

arXiv:2509.05058v2 Announce Type: replace-cross Abstract: Supersolidity and magnetism are fundamental phenomena characterizing strongly correlated matter. Here we unveil a mechanism that directly connects these two regimes and can be experimentally accessed in ultracold atomic systems. Specifically, we exploit the distinctive properties of magnetic lanthanide atoms trapped in a one-dimensional anti-magic wavelength optical lattice. This platform enables a realistic implementation of a triangular Bose-Hubbard ladder featuring two key ingredients: strong long-range interactions and tunable gauge fields. Owing to these properties, our numerical analysis reveals a robust lattice supersolid regime with finite fluxes in each triangular plaquette. Remarkably, we show that the density modulation of the supersolid phase and a finite gauge field induce magnetic ordering of the fluxes, forming ferromagnetic and ferrimagnetic patterns. Our results thus reveal a fascinating quantum effect that bridges supersolidity and magnetism.

23.
arXiv (quant-ph) 2026-06-16

On-chip semi-device-independent quantum random number generator exploiting contextuality

arXiv:2601.08392v2 Announce Type: replace Abstract: We present a semi-device-independent quantum random number generator (QRNG) based on the violation of a contextuality inequality, implemented by the integration of two silicon photonic chips. Our system combines a heralded single-photon source with a reconfigurable interferometric mesh to implement qutrit state preparation, transformations, and measurements suitable for testing a KCBS contextuality inequality. This architecture enables the generation of random numbers from the intrinsic randomness of single-photon interference in a complex optical network, while simultaneously allowing a quantitative certification of their security without requiring entanglement. We observe a contextuality violation exceeding the classical bound by more than 10{\sigma}, unambiguously confirming non-classical behavior. From this violation, we certify a conditional min-entropy per experimental round of Hmin = 0.077 +- 0.002, derived via a tailored semidefinite-programming-based security analysis. Each measurement outcome therefore contains at least 0.077 +- 0.002 bits of extractable genuine randomness, corresponding to an asymptotic generation rate of 21.7 +- 0.5 bits/s. These results establish a viable route towards general-purpose, untrusted quantum random number generators compatible with practical integrated photonic quantum networks.

24.
arXiv (CS.AI) 2026-06-15

Low-Burden LLM-Based Preference Learning: Personalizing Assistive Robots from Natural Language Feedback for Users with Paralysis

arXiv:2604.01463v2 Announce Type: replace-cross Abstract: Physically Assistive Robots require personalized behaviors to ensure user safety and comfort. However, traditional preference learning methods, like exhaustive pairwise comparisons, cause substantial physical and cognitive fatigue for users with severe motor impairments. To solve this, we propose a low-burden, offline framework that translates unstructured natural language feedback directly into deterministic robotic control policies. To safely bridge the gap between ambiguous human speech and robotic code, our pipeline uses Large Language Models (LLMs) grounded in the Occupational Therapy Practice Framework. This clinical reasoning decodes subjective user reactions into explicit physical and psychological needs, which are then mapped into transparent decision trees. Before deployment, an automated "LLM-as-a-Judge" verifies the code's structural safety. We validated this system in a simulated meal preparation study with 10 adults with paralysis. Results show our natural language approach significantly reduces user workload compared to traditional baselines. Additionally, occupational therapists confirmed the generated policies are safe and accurately reflect user preferences.

25.
arXiv (CS.CL) 2026-06-16

Whose hotel does the AI recommend? An algorithm audit of reputation signals in LLM-assisted hotel selection

Travelers increasingly ask large language model (LLM) assistants which hotel to book, making these systems gatekeepers of property visibility – yet what moves their recommendations is undocumented. We conduct a pre-specified algorithm audit using a randomized choice-based conjoint: across personas, prompt templates, and twelve open-weight and proprietary models, assistants choose among five hotels whose guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position are independently randomized. We estimate the average marginal component effect of each signal on the probability of recommendation. Guest rating and price dominate (a top rating raises selection by 31.6 percentage points; a high price lowers it by 30.0), reproducing human valence-and-price primacy but over-weighting eco-certification and ignoring management response. List position – a content-free artifact – shifts recommendations causally, worth about \$12 per night. Stated reasons track revealed weights imperfectly. The findings ground generative engine optimization and the accountability of AI infomediaries in causal evidence.