Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Experimental quantum state learning with pairs of photons

arXiv:2606.16932v1 Announce Type: new Abstract: Tomography allows one to estimate the density matrix describing the state an ensemble of quantum systems are prepared in (for example, polarization tomography determines the polarization state of a beam of identically prepared photons). In general, it is not possible to uniquely decompose the density matrix into its pure state components. Agarwal et al. proposed a protocol which, for a mixture composed of any two pure states of a qubit (with arbitrary probabilities), allows an observer to infer not only the density matrix but the identity of those specific pure states and their weights - the additional requirement being that the qubits arrive in pairs, where both qubits in each pair are in the same state. We experimentally demonstrate this learning-from-pairs concept using photons in the polarization degree of freedom. We use tomography to measure a sequence of single photons and make use of their time-of-arrival information to 'pair up' the photons after the measurement. From here we are able to infer the photons' polarization states and their respective probabilities, and we demonstrate this for various different choices of polarization states and ratios. Finally, we investigate our ability to discriminate between two equal mixtures of distinct pairs of orthogonal polarization states. We find that on the order of approx. 10e4 photons is typically enough to achieve tomography fidelities of approximately 0.9999. This is sufficient to discriminate between two different preparations of the same mixed state, differing by angles of less than 5 degrees between the pure states used in the two preparations.

02.
arXiv (quant-ph) 2026-06-11

Wigner Cat Phases: A finely tunable system for exploring the transition to quantum chaos

作者:

arXiv:2512.22169v4 Announce Type: replace Abstract: A quantum mechanical setting consisting of a frozen qubit composed with a fully thermalized chaotic system of N states is proposed, with potential relevance to quantum control. Observing the states of the composed system selectively retaining the states leads to the observation of novel localization in the subsystem. At a tuning parameter of 1.0, implying no selection, the system exhibits Wigner-Dyson level spacing statistics, indicative of quantum chaos. As the tuning parameter is reduced and selection occurs at a cutoff, the nearest-neighbor level spacing distribution develops heavier tails, a signature of suppressed spectral mixing and the emergence of non-thermal dynamics. In these regimes, the eigendensity develops a pronounced "cat-ears" structure, reflecting the formation of spatially localized bimodal eigenstates. These topological features persist without transitioning to Poisson statistics, indicating a transition from quantum chaos to a non-thermal, novel many-body localized (MBL) regime-referred to as Wigner Cat Phases. The proposed mixed random matrix ensemble offers a practical probe for sustaining this novel quantum localization setting. Results from our rigorous spectral statistics analysis show how "cat-ears" form in spectral densities based on the degree of selection or disorder and indicate that gap ratio statistics must be used with caution in detecting the full integrable limit due to the possibility of heavy-tailed Wigner-Dyson distributions.

03.
arXiv (quant-ph) 2026-06-12

Squeezing Enhancement in Lossy Multi-Path Atom Interferometers

arXiv:2409.04091v3 Announce Type: replace Abstract: This paper explores the sensitivity gains afforded by spin-squeezed states in atom interferometry, in particular using Bragg diffraction. We introduce a generalised input-output formalism that accurately describes realistic, non-unitary interferometers, including losses due to velocity selectivity and scattering into undesired momentum states. This formalism is applied to evaluate the performance of one-axis twisted spin-squeezed states in improving phase sensitivity. Our results show that by carefully optimising the parameters of the Bragg beam splitters and controlling the degree of squeezing, it is possible to improve the sensitivity of the interferometer by several dB with respect to the standard quantum limit despite realistic levels of losses in light pulse operations. However, the analysis also highlights the challenges associated with achieving these improvements in practice, most notably the impact of finite temperature on the benefits of entanglement. The results suggest ways of optimising interferometric setups to exploit quantum entanglement under realistic conditions, thereby contributing to advances in precision metrology with atom interferometers.

04.
arXiv (CS.CL) 2026-06-16

Long-Context Modeling via GSS-Transformer Hybrid Architecture with Learnable Mixing

Modeling long-range dependencies remains a central challenge in natural language processing. Transformer architectures achieve strong performance via self-attention but scale quadratically ($O(N^2)$) with sequence length, while State Space Models (SSMs) scale linearly ($O(N)$) but suffer from a selective recall bottleneck, struggling to retrieve precise information from compressed states. This creates a fundamental tradeoff between efficiency and perplexity. To tackle these challenges, we propose the Parallel Hybrid Architecture (PHA), which runs Gated State Spaces (GSS), Grouped Query Attention (GQA), and Feed-Forward Networks (FFNs) as independent parallel branches fused by a learnable mixing mechanism. Instead of forcing SSMs to approximate attention or serializing the two paradigms, PHA allows each branch to specialize: GSS captures global context, while attention performs selective retrieval, with FFN providing complementary processing. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming Hedgehog (16.70) and H3-125M (23.70). Scaling to 180M parameters yields 16.42 PPL, which gives comparable results with the pure attention baseline while delivering 24\% higher throughput and up to 40\% lower memory usage at long contexts. On OpenWebText, our 125M model achieves 19.72 PPL, outperforming standard Transformers (20.60) and GSS hybrid baselines (19.80). These results demonstrate that separating sequence modeling paradigms into parallel specialists enables Transformer-level perplexity with substantially improved efficiency for long-context language modeling.

05.
arXiv (CS.AI) 2026-06-16

Direction-Conditioned Policies via Compositional Subgoal Scoring for Online Goal-Conditioned Reinforcement Learning

arXiv:2606.16515v1 Announce Type: cross Abstract: Hamilton-Jacobi-Bellman theory implies that the optimal goal-conditioned action depends on the goal only through the gradient of the goal-reaching distance at the current state, yet standard online GCRL still conditions the actor on the raw goal – a signal that is geometrically uninformative when the goal is far from the data distribution. We propose Direction-Conditioned Policies (DCP), a fully online method that decomposes goal-reaching into two components sharing one InfoNCE representation $\psi$: a subgoal-scoring step that selects a visited state $z_t$ aligned with the final goal $g$ in $\psi_g$, and a direction-conditioned actor that consumes the unit direction $d_t$ and magnitude $r_t$ from $\psi(s_t)$ to $\psi(z_t)$. The two components train jointly, factor cleanly at deployment (subgoal scoring is removed, while direction conditioning remains with $g$ in place of $z_t$), and admit independent modification at the same $(d_t,r_t)$ interface. We prove three results. First, direction sufficiency under HJB: the optimal action under control-affine dynamics depends on the goal only through the value gradient. Second, a quantitative bound showing that, under mild conditions on the learned representation and assuming the scoring rule returns an on-path $z_t$, the actor's conditioning input at training and at deployment coincide up to representation error and geodesic slack. Third, a controllable-subspace characterization of when directional conditioning fails. Across nine environments, DCP improves over Contrastive RL on most final metrics, with the largest gains on manipulation and obstacle-interaction tasks; a qualitative analysis of the learned $\psi$-distance landscape shows the contrastive representation behaves as an online quasimetric encoding environment topology, and the single failure case (AntSoccer) localizes to a learned-gradient pathology that the theory anticipates.

06.
arXiv (CS.CV) 2026-06-17

SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis

Generative models have shown great promise for novel view synthesis (NVS) by leveraging strong image generation priors. However, existing approaches typically follow a 2D inpainting paradigm, first completing missing image regions and then performing 3D reconstruction. This strategy often causes geometry distortion and appearance drift, as 2D inpainting models cannot reliably infer the underlying 3D structure required for cross-view consistent generation. In this paper, we propose SceneCompleter, a geometry-aware framework that reformulates generative NVS as dense 3D scene completion. Instead of hallucinating isolated 2D views, SceneCompleter jointly completes geometry and appearance through a geometry-appearance dual-stream diffusion model in a spatially aligned RGBD latent space. To provide holistic scene context, we further introduce a Scene Embedder that conditions generation on global semantic and stylistic information from reference images. The completed RGBD predictions are then aligned and integrated into an expandable 3D scene representation, enabling iterative and coherent scene completion. Extensive experiments on in-domain and out-of-distribution datasets demonstrate that SceneCompleter produces visually plausible and geometrically consistent novel views across diverse scenarios. Project Page: https://chen-wl20.github.io/SceneCompleter

07.
arXiv (CS.AI) 2026-06-18

Better Adherence, Richer Context: A Field Evaluation of LLM-Powered Conversational Voice Diaries for Sleep

arXiv:2606.18596v1 Announce Type: cross Abstract: Sleep diaries are central to behavioral sleep medicine and cognitive behavioral therapy for insomnia, yet daily completion is difficult to sustain, and static forms often provide limited context for interpreting night-to-night sleep variation. We designed an LLM-powered conversational voice diary that delivers clinically grounded morning and evening sleep diary questions through proactive smart-speaker prompts, structured conversational intake, and adaptive follow-up dialogue. We evaluated the system in a four-week between-subjects field study with 30 university students, comparing it with a text-based mobile diary using matched diary items, reporting windows, and reminder intervals. Compared with the text-based diary, the conversational voice diary showed higher adherence and elicited more detailed contextual self-report about routines, stressors, environmental conditions, and other sleep-related factors. Participants also described the voice diary as easier to integrate into daily routines, despite longer perceived completion time. However, voice-based conversational intake produced lower completeness for some structured diary fields, revealing a trade-off between expressive richness and structured precision. These findings show both the promise and the challenge of using LLM-powered conversational voice assistants for longitudinal health self-report.

08.
arXiv (quant-ph) 2026-06-12

Path integral control of open quantum systems

arXiv:2410.18635v4 Announce Type: replace Abstract: We investigate open-loop quantum state preparation for a class of open quantum systems whose dynamics follow a Gorini-Kossakowski-Lindblad-Sudarshan (GKLS) master equation that admits a trajectory-based stochastic representation. The deterministic control objective is reformulated as a stochastic optimal control problem – interpreting stochasticity as a methodological tool akin to stochastic Schrödinger equation unravelings – which situates the problem within the path integral control framework. For the class of GKLS generators under consideration, this reformulation leads to an explicit expression for the optimal control as a weighted average over stochastic quantum trajectories, thereby eliminating the need for gradient evaluations. Building on this theoretical result, we derive a control update rule for piecewise-constant control pulses and demonstrate that adaptive importance sampling progressively enhances the control estimator during optimization, culminating in the algorithm we term Path integral Quantum Control (PiQC). We further introduce an annealed variant of PiQC, wherein a synthetic noise schedule gradually steers open-system trajectories toward closed-system dynamics, enabling high-fidelity unitary state preparation. Numerical studies on a dissipative single-qubit system and a multi-qubit Nuclear Magnetic Resonance model verify that PiQC yields precise open-loop controls and displays robustness to Hamiltonian perturbations. We propose PiQC as a trajectory-based alternative to gradient-based approaches, which might offer a viable solution in quantum control problems where gradient computation is infeasible or computationally demanding.

09.
arXiv (quant-ph) 2026-06-16

Exactly Solvable Quantum Model with Spin-Dependent Coulomb Interaction

arXiv:2501.05103v5 Announce Type: replace Abstract: In this work, we report an exactly solvable quantum model featuring a spin-dependent Coulomb interaction, described by the spin vector potential \(\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2\) together with a Coulomb-type scalar potential \(\varphi = \kappa / r\) . The model is governed by the Schrödinger-type Hamiltonian \(\mathcal{H}_S = \vec{\Pi}^2 / (2M) + q \varphi\) in nonrelativistic quantum mechanics and by the Dirac-type Hamiltonian \(\mathcal{H}_D = c \vec{\alpha} \cdot \vec{\Pi} + \beta M c^2 + q \varphi\) in relativistic quantum mechanics, where \(\vec{\Pi} = \vec{p} - (q/c)\vec{\mathcal{A}}\) is the canonical momentum. We demonstrate two main results: (i) Just as the Coulomb-type scalar potential \(\mathcal{S}_Maxwell = \{\vec{\mathcal{A}} = 0,\ \varphi = \kappa / r\}\) is a local exact solution of Maxwell's equations on $r\neq0$, the gauge potential \(\mathcal{S}_YM = \{\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2,\ \varphi = \kappa / r\}\) constitutes a local exact solution of the Yang–Mills equations on the punctured region $r\neq0$. (ii) Both Hamiltonians \(\mathcal{H}_S\) and \(\mathcal{H}_D\) can be solved exactly in the presence of this spin-dependent Coulomb interaction. The resulting energy spectra are derived, and they naturally reduce to those of the ordinary hydrogen atom when the spin-dependent terms are neglected. Finally, we clarify the quantization conditions and the fixed-background interpretation of the model.

10.
arXiv (math.PR) 2026-06-16

Layerwise Terminal Discrepancy in Chen's Reverse-Heat Coupling on the Boolean Cube

arXiv:2606.04573v2 Announce Type: replace-cross Abstract: Recently, Chen [Chen2026] proved that Talagrand's Boolean convolution conjecture holds up to the dimension-free factor \((\log\log\eta)^{3/2}\), namely for every fixed \(\tau>0\), \[ \mu\{P_\tau f>\eta\|f\|_1\} \le C_\tau \frac{(\log\log\eta)^{3/2}}{\eta\sqrt{\log\eta}}, \qquad \eta>e^3. \] We revisit the terminal testing-discrepancy step in Chen's perturbed reverse-heat coupling. Chen estimates this discrepancy globally in terms of the remaining gap to the terminal level. We keep the same coupling and the same reverse-heat formulations, but localize the terminal discrepancy on each remaining-gap layer before summing the layers. This changes the fixed-time anti-concentration cost from order \((\log L)^{3/2}/\sqrt L\) to order \((\log L)/\sqrt L\), where \(L=\log\eta\). Consequently, we obtain a \((\log\log\eta)^{1/2}\) improvement as \[ \mu\{P_\tau f>\eta\|f\|_1\} \le C_\tau \frac{\log\log\eta}{\eta\sqrt{\log\eta}}, \qquad \eta>e^3. \]

11.
medRxiv (Medicine) 2026-06-11

Vascular Phenotyping in Parkinson's Disease: Diabetes Mellitus Operationalizes a Microvascular Metabolic Syndrome Cluster Across PPMI Diagnostic Cohorts

Background: Diabetes mellitus elevates Parkinson's disease (PD) risk, via hypothesized cerebrovascular mediation. Whether the diabetes/prediabetes vascular-risk phenotype concentrates in cardiometabolic risk or macrovascular events across prodromal and clinically diagnosed PD remains unresolved. Objectives: To quantify the vascular-risk burden associated with diabetes/prediabetes across the PPMI diagnostic cohorts to test whether this association differs by cohort. Methods: Cross-sectional analysis of 413 PPMI participants (76 healthy controls, 145 prodromal PD, 192 clinically diagnosed PD) examined diabetes/prediabetes (n = 73) and seven vascular risk factors. The Vascular Burden Score (0 to 7) was a priori partitioned into microvascular and macrovascular sub-scores. Modified Poisson regression estimated adjusted prevalence ratios (aPR), adjusted for age, sex, and body mass index. A cohort-by-diabetes interaction tested cross-cohort consistency. Sensitivity analyses incorporated nigral diffusion tensor imaging (PD-risk biomarker) and FreeSurfer white matter hypointensity volume (cerebrovascular marker). Results: Diabetes/prediabetes elevated Vascular Burden Score ({beta} = 0.53, 95% CI 0.29 to 0.77, p < 0.001) versus non-diabetic participants, with a non-significant cohort-by-diabetes interaction (F = 0.29, p = 0.747). Three microvascular factors survived false discovery rate correction: obesity (aPR 2.28), hypertension (aPR 1.60), and hyperlipidemia (aPR 1.45). Macrovascular events showed no diabetic amplification ({beta} = -0.06, p = 0.25). In the imaging-phenotyped subset, Vascular Burden Score components contributed classifier variance distinct from nigral microstructure. Conclusions: Diabetes/prediabetes operationalize a microvascular cluster stable across prodromal and idiopathic PD. Cardiometabolic phenotyping may complement established PD-risk biomarkers (dopamine transporter SPECT, nigral diffusion), pending longitudinal validation linking vascular phenotype to dopaminergic markers.

12.
arXiv (CS.CV) 2026-06-11

AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

13.
arXiv (CS.LG) 2026-06-19

Evaluating Universal Machine Learning Force Fields Against Experimental Measurements

arXiv:2508.05762v2 Announce Type: replace-cross Abstract: Universal machine learning force fields (UMLFFs) promise to revolutionize materials science by enabling rapid atomistic simulations across the periodic table. However, their evaluation has been limited to computational benchmarks that may not reflect real-world performance. We introduce UniFFBench, a comprehensive evaluation framework featuring the MinX dataset – a diverse collection of 1,500+ mineral systems spanning 85 elements, extreme thermodynamic conditions (0–5000 K, 0–1000 GPa), and structural complexity, including partial occupancy and disorder. This diversity, combined with experimental reference values for validation, enables assessment of UMLFF generalization across chemical space and conditions substantially beyond typical training scenarios. Our systematic evaluation of six state-of-the-art UMLFFs reveals a substantial ``reality gap'': models achieving impressive performance on computational benchmarks often fail when confronted with experimental complexity. Even the best-performing models exhibit higher density prediction error than the threshold required for practical applications. We observe disconnects between simulation stability and mechanical property accuracy, with prediction errors correlating with training data representation rather than the modeling method.

14.
arXiv (CS.LG) 2026-06-11

APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations

arXiv:2606.11553v1 Announce Type: new Abstract: Generic time-series foundation models transfer poorly to wireless network telemetry whose signals are bursty, zero-inflated, and coupled across protocol layers. We present APEX, a network-native, decoder-only transformer for forecasting enterprise AP telemetry, and evaluate it on DHCP degradation as a representative network task. APEX is pre-trained on 10-channel multivariate telemetry from ~4,500 production wireless networks (~100K AP time series, 34 metrics per AP), and is available as APEX-Large (269M, cloud) and APEX-Edge (10.5M, edge). On a 192-step (4-day) DHCP degradation benchmark, APEX-Large reduces MAE by 18% over the strongest foundation-model baseline (Toto) and 38% over SARIMA, with anomaly-detection F1 = 0.93, while APEX-Edge enables sub-second, privacy-preserving inference on AP-class edge hardware. These results suggest network-native pre-training is a practical foundation for proactive wireless operations.

15.
medRxiv (Medicine) 2026-06-16

Development and reliability and validity test of the Questionnaire on Knowledge, Attitude and Practice of ICU Nurses on Blood Oxygen Saturation Management in Mechanically Ventilated Patients

Objective: A questionnaire on the knowledge, attitude and practice of ICU nurses regarding the management of blood oxygen saturation in patients with mechanical ventilation was compiled, and its reliability and validity were tested. Method: Drawing upon the knowledge-attitude-practice theory, the initial questionnaire draft was developed through literature review and consultation with Delphi experts. Employing convenience sampling, 32 nurses from the General ICU of Wuxi Second People's Hospital were surveyed between 1 August 2025 and 27 September 2025, enabling item screening and assessment of reliability and validity.The full version of the developed questionnaire is provided as Supporting Information (S1 File). All items are published under a CC BY 4.0 license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Result: A questionnaire on the knowledge, attitude and practice of ICU nurses regarding the management of blood oxygen saturation in mechanically ventilated patients was finalised, comprising 26 items: 11 in the knowledge dimension, 6 in the attitude dimension and 9 in the behaviour dimension. The overall Cronbach's coefficient for the questionnaire was 0.88, with dimension-specific coefficients of 0.787, 0.722, and 0.781 respectively. The Spearman-Brown coefficient for the entire questionnaire was 0.967, while dimension-specific coefficients were 0.796, 0.666, and 0.728 respectively. The content validity index at the questionnaire level (S-CVI) was 0.886, and the item-level content validity index (I-CVI) ranged from 0.913 to 0.967. 0.728. The questionnaire's level content validity index (S-CVI) was 0.886, and the item level content validity index (I-CVI) ranged from 0.913 to 1.00. Conclusion: The questionnaire on knowledge, attitude and practice of blood oxygen saturation management in mechanically ventilated patients demonstrates good reliability and validity. It may serve as an assessment tool for intensive care unit nurses regarding their knowledge, attitude, and practices concerning blood oxygen saturation management in mechanically ventilated patients, thereby establishing a foundation for developing targeted intervention strategies in future practice.

16.
arXiv (CS.CV) 2026-06-12

Masked and Predictive Self-Supervised Foundation Models for 3D Brain MRI

Self-supervised foundation models have shown strong promise in medical imaging. However, existing MRI foundation-model studies have primarily emphasized segmentation and dense prediction tasks, while systematic investigation of self-supervised foundation models for MRI-based disease detection remains limited. In this work, we investigate two major self-supervised pretraining paradigms for MRI-based disease detection: reconstruction-based learning via Masked Autoencoders (MAE) and predictive representation learning via Joint Embedding Predictive Architectures (JEPA). We study the role of auxiliary objectives by introducing a novel spectral-domain reconstruction loss for MAE to enhance sensitivity to fine-grained anatomical structure, and by integrating variance–covariance regularization (VCR) within our JEPA framework to encourage decorrelated latent representations. Our models are pretrained on heterogeneous single-contrast MRI volumes in a contrast-agnostic setting, without modality concatenation. Across five downstream disease detection tasks, our results highlight the importance of self-supervised objective design for medical foundation model pretraining, demonstrating that the downstream benefit of each objective is determined by its relevance to the task's structure. Specifically, spectral regularization yields the largest improvements when the downstream discriminative signal is characterized by strong high-frequency anatomical structures, while covariance regularization is most beneficial when discriminative information spans multiple decorrelated feature dimensions. MAE with spectral-domain supervision consistently achieves superior downstream performance for MRI-based disease detection. These findings suggest that self-supervised objectives in medical imaging encode specific biases, and their downstream benefit is fundamentally conditioned on the task's structure.

17.
arXiv (CS.AI) 2026-06-15

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

arXiv:2602.03120v2 Announce Type: replace-cross Abstract: Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and continuous weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient estimation. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision weight updating signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning methods on a variety of tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

18.
arXiv (CS.AI) 2026-06-19

Uncertainty-Aware Reward Modeling for Stable RLHF

arXiv:2606.19818v1 Announce Type: cross Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental challenges: (1) reward models cannot signal when their predictions are unreliable, since they usually act as deterministic point estimators; and (2) modern group-based policy optimization can amplify unreliable reward signals, as exemplified by GRPO's uniform treatment of rewards during advantage computation. As policies explore increasingly diverse responses, these two limitations create a critical vulnerability: unreliable reward estimates may be granted disproportionate influence, triggering severe reward hacking. We propose Uncertainty-Aware Reward Modeling (UARM), which equips reward models with calibrated uncertainty via quantile-based conformal prediction and reweights GRPO advantages through heteroscedastic variance decomposition. Experiments across HelpSteer, UltraFeedback, and PKU-SafeRLHF demonstrate that UARM significantly improves reward model calibration, reduces reward hacking, and enhances downstream alignment quality compared to standard GRPO and uncertainty-agnostic baselines.

19.
arXiv (math.PR) 2026-06-16

Experimentation for Different Scheduling Policies on Queues: Mixed Differences-in-Q Estimators Based on Little's Law

arXiv:2605.29641v2 Announce Type: replace-cross Abstract: In data centers, tasks are dispatched to various servers to evenly distribute the workload. When a data center considers implementing a new scheduling algorithm, it typically conducts an A/B test prior to deployment to assess the real-world impact of this new method. However, a straightforward A/B test might be interfered with so-called ``Markovian'' interference. We utilized the Differences-in-Q estimator, as developed by Farias et al. (2022), and introduced mixed Differences-in-Q estimators grounded in Little's Law. We show that our A/B testing methods significantly reduce bias and variance when testing various scheduling policies. Extensive simulations were conducted under scenarios like non-stationary arrival rates, heterogeneous service rates, and communication delays. These simulations highlight the robustness and efficacy of our A/B testing approach.

20.
arXiv (CS.CV) 2026-06-16

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

Autoregressive long video generation often adopts bounded-memory streaming for efficiency, typically combining local windows for short-term continuity with static early-frame sinks as long-range anchors. However, this fixed allocation keeps early frames cached even when the current visual state has substantially diverged from them, while discarding potentially more relevant intermediate history. As a result, the retained long-range context may become less adaptive and bias generation toward outdated cues; in severe cases, RoPE-induced phase re-alignment can homogenize inter-head attention and cause sink collapse, where content regresses toward sink frames. We propose DySink, a retrieval-based framework that maintains a compact memory bank and selects visually relevant historical frames as dynamic frame sinks. DySink couples adaptive retrieval with a sink anomaly gate, which detects excessive inter-head consensus over retrieved context and suppresses collapse-prone context. Experiments on minute-long videos show that DySink consistently improves dynamic degree over strong baselines while also achieving higher temporal quality. The code and model weights will be released at https://github.com/yebo0216best/DySink.

21.
arXiv (CS.AI) 2026-06-19

Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

arXiv:2602.23172v2 Announce Type: replace-cross Abstract: Capturing 4D spatiotemporal scene structure is crucial for the safe and reliable operation of robots in dynamic environments. However, existing approaches typically address only part of the problem: they either provide coarse geometric tracking via bounding boxes or detailed 3D occupancy estimates that lack explicit temporal association and instance-level reasoning. In this work, we present Latent Gaussian Splatting (LaGS) for 4D Panoptic Occupancy Tracking (4D-POT). We revisit the underlying representation and model 3D features as a sparse set of feature-bearing Gaussians. These act as dynamic, volume-oriented keypoints that enable spatially continuous, distance-weighted aggregation of multi-view features before being splatted into a voxel grid for decoding. This point-centric formulation enables flexible, data-dependent receptive fields and long-range spatial interactions that are difficult to capture with local and dense voxel-based operators. A hierarchical Gaussian representation further enables multi-scale reasoning by combining global context from coarse super-points with fine-grained detail from higher-resolution streams. Extensive experiments on Occ3D nuScenes and Waymo demonstrate state-of-the-art performance for 4D-POT. We provide code and models at https://lags.cs.uni-freiburg.de/.

22.
arXiv (CS.CL) 2026-06-18

TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving

Full-Duplex Speech Language Models (FD-SLMs) are specialized foundation models designed to enable natural, real-time spoken interactions by modeling complex conversational turn-taking such as interruptions, backchannels, and overlapping speech. End-to-end (e2e) FD-SLMs leverage real-world double-channel conversational data to capture nuanced two-speaker dialogue patterns for human-like interactions, but their conversational abilities often degrade compared to pure-text conversation due to prolonged speech sequences and limited high-quality spoken dialogue data. Although interleaved text-speech generation could mitigate this degradation, integrating discrete text tokens into continuous double-channel audio streams could disrupt the precise time alignment required for fluid interaction. To address this, we propose TurnGuide, a novel text-speech interleaved generation approach for e2e FD-SLMs that dynamically segments assistant speech into dialogue turns and interleaves turn-level text and speech generation. This approach allows FD-SLMs to integrate the semantic intelligence of LLMs without compromising the natural acoustic flow. Extensive experiments show that TurnGuide not only significantly improves e2e FD-SLMs to produce semantically meaningful, coherent speech but also achieves state-of-the-art performance on various turn-taking events. Demos are available at https://dreamtheater123.github.io/TurnGuide-Demo/. Code is available at https://github.com/dreamtheater123/TurnGuide.

23.
arXiv (quant-ph) 2026-06-15

Spin disorder competing with positional symmetry breaking governs the metal-insulator behavior in oxide paramagnets

arXiv:2606.14624v1 Announce Type: cross Abstract: Numerous transition-metal oxides have low-temperature antiferromagnetic (AFM) states and high-temperature paramagnetic (PM) phases, where the AFM state is usually insulating while the PM phase can be either insulating or metallic. Without involving strong correlation, we use symmetry-broken density-functional theory (DFT) to obtain the PM phases of insulating NaFeO3 vs the recently discovered metallic NaOsO3. We develop the understanding of insulating and metallic behaviors in paramagnetic oxides by analyzing the interactions between magnetic and positional symmetry breaking: The insulating gap is governed by the competition between the spin disorder that induces a distribution of different magnitudes of local magnetic moments and the polymorphous distribution of off-center atomic displacements. NaFeO3, on the other hand, has large positional displacement with small spin-disorder-induced moments distribution, leading to insulating PM phase, whereas NaOsO3 has a pronounced spin-disorder-induced moments distribution that forces the PM phase to become metallic. Our work identifies this symmetry-breaking competition as a general framework to bridge seemingly disparate metal-insulator behaviors in transition-metal oxides paramagnets without invoking strong correlation.

24.
arXiv (CS.CV) 2026-06-11

MedCTA: A Benchmark for Clinical Tool Agents

To make clinically grounded decisions, medical AI agents are expected to go beyond simple recognition and be capable of tool retrieval, evidence acquisition, and integration. Existing benchmarks largely evaluate isolated perception or single-turn question answering, and therefore provide limited visibility into failures of planning, tool recruitment, and rollout reliability. We introduce MedCTA, a benchmark for evaluating medical tool agents on clinician-validated, step-implicit tasks grounded in realistic multimodal clinical inputs, including radiology images, pathology slides, and reports. MedCTA comprises 107 real-world clinical tasks with clinician-verified executable trajectories over 5 deployed tools, and supports process-aware evaluation of tool selection, argument validity, execution stability, trajectory fidelity, and outcome quality. We benchmark 18 open- and closed-source multimodal models and find that even frontier systems remain brittle in multi-step clinical tool use: autonomous rollouts are dominated by protocol failures, premature stopping, and incorrect tool recruitment, while gold-standard tool routing yields large but still incomplete gains. These results show that strong backbone perception does not translate into reliable agentic behavior in clinical settings. MedCTA provides a rigorous testbed for auditing, diagnosing, and advancing trustworthy medical AI agents. The dataset and evaluation suite are available at https://ivul-kaust.github.io/MedCTA/

25.
arXiv (CS.LG) 2026-06-15

A Low-Rank Subspace Analysis of LLM Interventions

arXiv:2606.14388v1 Announce Type: new Abstract: Interventions designed to modify a particular behavior in LLMs, such as refusal or sycophancy, often produce unintended changes in other behaviors. This lack of targeted control makes it difficult to design and implement reliable safety controls. To understand these side-effects, we introduce a diagnostic framework for analyzing interacting behaviors in LLMs. We model behaviors as low-rank subspaces in activation space, and study how interventions influence across behaviors. Across multiple instruction-tuned models (7B-70B) and across refusal, jailbreak, and sycophancy settings, we find that different behaviors share internal representations, and intervening on one behavior alters others in asymmetric ways. Some behaviors act as upstream control points whose interventions propagate broadly across other behaviors, while others remain more isolated. We relate these effects to two geometric quantities: (i) the overlap between behavior subspaces, measured as the average squared cosine of principal angles, and (ii) the angle between each behavior subspace and the decision subspace (capturing the model's final decision e.g., refuse vs. comply). Empirically, intervention effects on other behaviors tend to be larger for behavior pairs with higher subspace overlap, and for source behaviors whose subspaces lie closer (smaller angle) to the decision subspace. These findings highlight a challenge for targeted behavior control: behaviors are difficult to modify independently, as interventions can propagate through shared representations and asymmetric interactions.