Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-19

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

arXiv:2606.20209v1 Announce Type: cross Abstract: Joint spatial and temporal understanding of 3D scenes is a crucial requirement for robots deployed in everyday household environments. Such agents must not only comprehend and navigate spatial layouts, but also reason about how these spaces evolve over time. In particular, humans interact with objects daily, causing them to change position throughout the environment and making it difficult for robots to reliably associate current observations with previously seen objects. However, these interactions are not random: human habits and routines induce spatio-temporally consistent patterns in object locations, which robotic agents can potentially learn and then exploit for downstream tasks such as navigation. To this end, we introduce FlowMaps, a latent flow matching model for estimating multimodal distributions over the future locations of dynamic objects in a continuous 3D space. By learning the implicit dependencies among objects and their temporal evolution, FlowMaps predicts likely changes in object locations conditioned on past human interactions, while supporting generalization across previously unseen environments that share similar object routines. To demonstrate the utility of this method, we deploy FlowMaps in a downstream dynamic Object Navigation task in both simulated and real-world environments. Across more than 600 episodes, FlowMaps outperforms state-of-the-art approaches, showing that modeling object dynamics through continuous, multimodal spatio-temporal distributions improves robotic search and navigation in changing household environments. Code and additional material is available at https://fra-tsuna.github.io/flowmaps/.

02.
arXiv (quant-ph) 2026-06-19

Random Projections for Multi-Copy Quantum Algorithms

arXiv:2606.20238v1 Announce Type: new Abstract: Estimating nonlinear properties of quantum states is a central task in quantum information science. Multivariate traces, $\mathrm{tr}(\rho_1 \cdots \rho_K)$, and nonlinear observables such as $\mathrm{tr}(\rho^K)$, for integer $K$, can be accessed through collective measurements on multiple state copies, but standard protocols based on swap tests require coherent operations on the full Hilbert space and become experimentally unfeasible for large systems. In this work, we introduce a framework for multi-copy measurements based on random projections onto lower-dimensional subspaces prior to the collective measurement, which is then performed only on the reduced Hilbert space. This procedure yields a tunable tradeoff between coherent quantum resources and statistical sampling overhead, allowing the amount of coherent processing to be matched to the capabilities of the underlying hardware. We derive explicit formulas relating the Haar-averaged projected moments to multivariate traces of the original states and analyze the sampling overhead induced by the projection procedure. Specifically, after compressing an $n$-qubit state to a reduced $q$-qubit subspace, estimating $\mathrm{tr}(\rho^K)$ requires approximately $O(2^{(n-q)(K-1)})$ copies of $\rho$, with each qubit projected out increasing the sampling cost by a factor of $2^{K-1}$. Our results establish how coherent multi-copy operations can be traded for additional state copies, enabling multi-copy quantum protocols to be optimized for the available hardware resources.

03.
arXiv (quant-ph) 2026-06-19

Operator Learning for efficient Quantum Computation

arXiv:2606.20184v1 Announce Type: new Abstract: An efficient implementation of quantum algorithms is often hindered by the lack of efficient primitives for operators and state preparation. This limits both the ability of near-term quantum hardware to simulate complex problems and the potential of fault-tolerant algorithms to achieve practical quantum advantage. To address this, we propose a full-stack variational framework that transforms arbitrary operators to compact quantum circuits. The resulting variational circuits can be tailored to the connectivity and long-range interaction of the target hardware. The learning process employs backpropagation together with a cost function that efficiently optimizes unitary operators and non-unitary – dense or sparse – operators using only a single ancilla qubit for block encoding. Additionally, we introduce a regularization term that reduces the approximation error. The approach is validated for both quantum mechanical and engineering applications. In the former case, we learn propagators that arise in native quantum problems – such as quantum simulation and quantum chemistry – and achieve improved resource scaling in comparison to standard Suzuki-Trotter expansions. In the latter case, we demonstrate the approach's ability to implement the second-order central finite difference approximation of the Laplace operator – relevant for solving partial differential equations – while improving upon current error metrics. The final example deals with learning a dense, non-unitary operator that arises in the analysis of inviscid potential flow around an airfoil. This universality of the framework opens the door for solving general problems beyond prototypical engineering and quantum applications.

04.
arXiv (quant-ph) 2026-06-19

Vine Codes: Low-Overhead Quantum LDPC Codes on a Planar Square Grid

arXiv:2606.20263v1 Announce Type: new Abstract: The surface code is a promising route towards large-scale quantum computing, requiring only nearest-neighbour gates amenable to superconducting hardware. However, surface codes incur large qubit overheads. Novel quantum low-density parity check (qLDPC) codes promise to reduce overheads but require long-range connections that are difficult to achieve on superconducting platforms. Here, we introduce "Vine Codes" - qLDPC codes that are implementable on a planar square grid through nearest-neighbour, two-qubit gates native to superconducting platforms (iSWAP and CZ). Our approach generalises "Directional Codes" recently introduced by Gehér et. al. (2025) which are constrained to a torus. In contrast, vine codes have open boundary conditions constructed with the aid of routing qubits. We perform extensive numeric searches and find promising candidate vine codes, e.g. [[121,4,6]], [[221,6,7]], and [[234,9,6]] codes. We verify the circuit distances and show that data and measure qubits required can be reduced by up to ~28% relative to the surface code at a circuit distance of 7. Even including routing qubits, vine codes require fewer total qubits than the surface code (e.g. ~18% reduction at circuit distance 10) and benefits are expected to increase at higher distances. We perform circuit-level noise simulations to demonstrate that under a realistic noise model and at a near-term noise rate of $10^{-3}$, vine codes can perform better than the surface code while using fewer qubits. We give an exhaustive list of all unique vine codes up to stabiliser-weight 9. We additionally introduce "Flip-Vine Codes" which possess single-qubit transversal Clifford gates useful for fault-tolerant logic and magic state cultivation. We furthermore construct examples of generalised open boundaries for vine codes that go beyond the familiar X/Z boundaries of the surface and tile codes.

05.
arXiv (CS.LG) 2026-06-16

Circuit Tracing in Autoregressive Protein Language Models

arXiv:2606.16044v1 Announce Type: new Abstract: Protein language models (pLMs) can generate novel protein sequences with properties beyond those observed in nature, yet the mechanisms underlying protein generation remain poorly understood. Existing mechanistic interpretability methods based on sparse autoencoders and transcoders primarily focus on protein representation learning models and do not capture the computation required for autoregressive generation. Here, we introduce ProGenMech, a mechanistic interpretability framework for generative protein language models that extends cross-layer transcoders (CLTs) to ProGen3, a sparse Mixture-of-Experts model trained for both causal generation and span infilling. Unlike per-layer approaches, CLTs reconstruct each layer using sparse latent variables from all preceding layers, enabling faithful recovery of inter-layer generative computation. We further develop a zero-shot circuit discovery framework to identify sparse latent circuits responsible for protein generation and fitness prediction. In causal generation and zero-shot fitness estimation tasks, ProGenMech outperforms local transcoder baselines in recovering ProGen3's probability distribution and functional scoring behavior, while matching the original model's generative distribution in span infilling tasks. Moreover, the recovered circuits reveal biologically meaningful motifs and functional regions associated with conserved sequence patterns and protein fitness landscapes, establishing a foundation for interpretable and steerable protein generation.

06.
arXiv (CS.CV) 2026-06-18

RUB: Evaluating Residual Knowledge in Unlearned Models

Machine Unlearning (MUL) has emerged as a key mechanism for privacy protection and content regulation, yet current techniques often fail to guarantee the complete removal of sensitive information. While most existing works focus on verifying the execution of unlearning, they overlook the critical question of whether models remain robust against adversarial attempts to recover forgotten knowledge. In this work, we advocate for the principle of Robust Unlearning, which requires models to be both indistinguishable from retrained counterparts and resilient against diverse adversarial threats. To instantiate this principle, we propose a unified benchmark, RUB (Robust Unlearning Benchmark), that systematically evaluates the robustness of unlearning algorithms across classification, image-to-image reconstruction, and text-to-image synthesis. Within this framework, we introduce the Unlearning Mapping Attack (UMA) as a generalizable method to detect residual information, and demonstrate how existing attack strategies can be adapted into this framework as long as they conform to the generic UMA framework. Our experiments across discriminative and generative tasks reveal that state-of-the-art unlearning methods remain vulnerable under these evaluations, even when passing standard verification metrics. By positioning robustness as the central criterion and providing a benchmark for adversarial evaluation, we hope RUB paves the way toward more reliable and secure unlearning practices. The codebase and model checkpoints in RUB will be published.

07.
arXiv (CS.CV) 2026-06-11

CoVR-R:Reason-Aware Composed Video Retrieval

Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification text fully specifies the visual changes, overlooking after-effects and implicit consequences (e.g., motion, state transitions, viewpoint or duration cues) that emerge from the edit. We argue that successful CoVR requires reasoning about these after-effects. We introduce a reasoning-first, zero-shot approach that leverages large multimodal models to (i) infer causal and temporal consequences implied by the edit, and (ii) align the resulting reasoned queries to candidate videos without task-specific finetuning. To evaluate reasoning in CoVR, we also propose CoVR-Reason, a benchmark that pairs each (reference, edit, target) triplet with structured internal reasoning traces and challenging distractors that require predicting after-effects rather than keyword matching. Experiments show that our zero-shot method outperforms strong retrieval baselines on recall at K and particularly excels on implicit-effect subsets. Our automatic and human analysis confirm higher step consistency and effect factuality in our retrieved results. Our findings show that incorporating reasoning into general-purpose multimodal models enables effective CoVR by explicitly accounting for causal and temporal after-effects. This reduces dependence on task-specific supervision, improves generalization to challenging implicit-effect cases, and enhances interpretability of retrieval outcomes. These results point toward a scalable and principled framework for explainable video search. The model, code, and benchmark are available at https://github.com/mbzuai-oryx/CoVR-R.

08.
arXiv (CS.CV) 2026-06-16

GeoStream: Toward Precise Camera Controlled Streaming Video Generation

Accurate interactive camera control is essential for video-based world models, but most existing approaches learn camera motion implicitly, leading to inaccurate control under out-of-distribution trajectories. Explicit geometric conditioning improves controllability, but existing methods are non-autoregressive and rely on a static 3D cache built from an initial frame, which becomes ineffective once the viewpoint moves beyond the original frustum. We propose GeoStream, a framework that enables precise metric-scale camera control in autoregressive streaming video generation. Our method maintains a self-refreshing 3D cache that is periodically updated online from the model's own outputs: we estimate depth from the most recently generated frame, unproject to 3D, and reproject into the target view to produce point reprojections as geometric conditioning for subsequent synthesis. By the same principle, the conditioning seen during training is also rendered from the student's own generated frames, yielding a fully on-policy distillation that naturally aligns the train and inference conditioning distributions. Unlike prior work that uses off-policy condition noising, our approach trains the model against the exact error distribution it encounters at inference, mitigating both standard autoregressive drift and the second-order geometric feedback loop that arises when the cache itself is derived from generated outputs. Quantitative and qualitative results show that our approach substantially improves camera controllability.

09.
arXiv (quant-ph) 2026-06-16

Towards Quantum Limited Spatial Resolution of NV-Diamond Magnetometry

arXiv:2508.13438v2 Announce Type: replace Abstract: Optically addressable ensembles of solid-state defects, such as nitrogen vacancy (NV) centers, are a leading modality for imaging-based magnetometry, thermometry and strain sensing. However, monitoring the fluorescence of individual defects within a sub-diffraction ensemble remains an outstanding challenge that currently limits access to atomic-scale features and dynamics. For compact clusters of NVs, we formulate imaging-based atomic sensing as a low-dimensional multiparameter estimation task in which one seeks to localize each defect and quantify the field strength in its immediate vicinity. In this work, we employ optical spatial mode demultiplexing (SPADE) to enhance localization and brightness estimation accuracy at sub-diffraction scales. Specifically, we develop a two-stage sensing protocol that augments direct imaging by projecting the incoming optical field onto point spread function (PSF)-adapted, i.e., PAD spatial modes and Yuen-Kennedy-Lax (YKL) spatial modes enabling efficient extraction of emitter positions and brightnesses. The YKL-SPADE measurement employed for brightness estimation is shown to be quantum-optimal in the case of two emitters and establishes a new connection between quantum detection and estimation theories. We numerically evaluate the statistical performance of our protocol for sub-diffraction optically detected magnetic resonance (ODMR) and Rabi sensing experiments. Compared to conventional focal plane intensity measurements, our protocol improves emitter localization accuracy by 6$\times$ and brightness estimation accuracy by 2$\times$ for tightly confined ensembles, residing well below the diffraction limit.

10.
arXiv (CS.AI) 2026-06-16

Explainable deep learning improves human mental models of self-driving cars

arXiv:2411.18714v3 Announce Type: replace-cross Abstract: Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging to accurately anticipate when they will fail, with potentially catastrophic consequences. While research into interpreting these systems has surged, most of it is confined to simulations or toy setups due to the difficulty of real-world deployment, leaving the practical utility of such techniques unknown. Here, we introduce the Concept-Wrapper Network (CW-Net), a method for faithfully explaining the behavior of machine-learning-based planners that causally grounds their reasoning in human-interpretable concepts without sacrificing performance. We deploy CW-Net on a real self-driving car and show that the resulting explanations improve the human driver's mental model of the vehicle, allowing them to better predict its behavior, particularly in surprising situations. This demonstrates that explainable deep learning integrated into self-driving cars can be both understandable and useful in a realistic deployment setting. We anticipate our method could be applied to other safety-critical systems, such as autonomous drones and robotic surgeons, as well as to other architectures, such as end-to-end learning systems and vision-language-action models. Overall, our study establishes a deployment-validated pathway to interpretability for autonomous agents, which could help make them more transparent and safe.

11.
arXiv (CS.AI) 2026-06-16

RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

arXiv:2602.05367v3 Announce Type: replace Abstract: Efficient deployment of large language models (LLMs) requires extreme quantization, forcing a critical trade-off between low-bit efficiency and performance. Residual binarization enables hardware-friendly, matmul-free inference by stacking binary ($\pm$1) layers, but is plagued by pathological feature co-adaptation. We identify a key failure mode, which we term inter-path adaptation: during quantization-aware training (QAT), parallel residual binary paths learn redundant features, degrading the error-compensation structure and limiting the expressive capacity of the model. While prior work relies on heuristic workarounds (e.g., path freezing) that constrain the solution space, we propose RaBiT, a novel quantization framework that resolves co-adaptation by algorithmically enforcing a residual hierarchy. Its core mechanism sequentially derives each binary path from a single shared full-precision weight, which ensures that every path corrects the error of the preceding one. This process is stabilized by a robust initialization that prioritizes functional preservation over mere weight approximation. RaBiT redefines the 2-bit accuracy-efficiency frontier: it achieves state-of-the-art performance, rivals even hardware-intensive Vector Quantization (VQ) methods, and delivers a $4.49\times$ inference speed-up over full-precision models on an RTX 4090. Code is available at https://github.com/SamsungLabs/RaBiT.

12.
arXiv (CS.LG) 2026-06-16

InfoNCE Induces Gaussian Distribution

arXiv:2602.24012v2 Announce Type: replace Abstract: Contrastive learning has become a cornerstone of modern representation learning, allowing training with massive unlabeled data for both task-specific and general (foundation) models. A prototypical loss in contrastive training is InfoNCE and its variants. In this work, we show that the InfoNCE objective induces Gaussian structure in representations that emerge from contrastive training. We establish this result in two complementary regimes. First, we show that under certain alignment and concentration assumptions, projections of the high-dimensional representation asymptotically approach a multivariate Gaussian distribution. Next, under less strict assumptions, we show that adding a small asymptotically vanishing regularization term that promotes low feature norm and high feature entropy leads to similar asymptotic results. We support our analysis with experiments on synthetic and CIFAR-10 datasets across multiple encoder architectures and sizes, demonstrating consistent Gaussian behavior. This perspective provides a principled explanation for commonly observed Gaussianity in contrastive representations. The resulting Gaussian model enables principled analytical treatment of learned representations and is expected to support a wide range of applications in contrastive learning.

13.
arXiv (CS.LG) 2026-06-17

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Authors:

arXiv:2606.17182v1 Announce Type: new Abstract: Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics – the regime durable-execution engines enforce by deterministic replay – and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, $L_0 \subsetneq \cdots \subsetneq L_4$, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified $L_0 \to L_1$ refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.

14.
medRxiv (Medicine) 2026-06-22

Burden of Cardiovascular Disease in Brazil, 1996-2023: A Retrospective Descriptive Study of the Epidemiology and Impact on Public Healthcare with Emphasis on Acute Myocardial Infarction

Background Cardiovascular diseases (CVD) are the leading cause of death worldwide, and their epidemiology is correlated with genetic predisposition, exposure to risk factors, sex, age, access to medical care, and other sociodemographic characteristics. Brazil is a developing country with a vast territory, which leads to structural inequalities. Estimates of CVD in Brazil, in its regions, and in its population are poorly evaluated and analysed. Methods We obtained CVD-related data from the Brazilian Unified Health System (SUS) and analysed mortality and morbidity from 1996 to 2023 by sex, race/ethnicity, age, and region. We calculated the risk of death from the most prevalent diseases, the average length of hospital stay, and the costs associated with heart transplantation. Findings In Brazil, acute myocardial infarction was the pathology that led to the highest number of deaths across all variables analysed during the evaluated period. Other CVD were also related to causes of death and morbidity, such as hypertensive diseases and heart failure. Interpretation Brazil presents a serious challenge to the public health system due to the high number of deaths and the progressive mortality rate. This study represents a fundamental contribution to the basis for formulating public health policies aimed at reducing the growing impact associated with these diseases. Funding CNPq, CAPES, FAPEMIG, INCT

15.
arXiv (math.PR) 2026-06-16

A Concavity Theorem for the Parisi PDE

Authors:

arXiv:2606.15432v1 Announce Type: new Abstract: We prove that the map sending the diffusion profile to the solution of a time-changed Parisi PDE evaluated at time-space $(0,0)$ is concave. This result strengthens the raywise concavity result proven by Auffinger and Chen (2016). As an application, for the balanced multispecies Ising spin glasses, the lower bound of Bates and Sohn (2025) matches the Hopf-type upper bound given by the Hamilton–Jacobi framework developed by Mourrat, Chen and Xia.

16.
medRxiv (Medicine) 2026-06-15

GLLaucoMed: A Secure LLM-Powered Agentic Workflow for Automated Medication Extraction from Free-Text Glaucoma Clinical Notes

Purpose: To evaluate the efficacy of large language models (LLMs) in extracting medication-related information from glaucoma clinical notes in the electronic health record (EHR). Design: Cross-sectional. Subjects: 1,250 subjects in the Bascom Palmer Ophthalmic Repository. Methods: Extracted clinical notes from glaucoma-related encounters between 2014 and 2024 were labeled by two glaucoma specialists with a third serving as an adjudicator. Graders were asked to label current topical medications (CTM), proposed changes to topical medications ({Delta}TM), current oral medications (COM), and proposed changes to oral medications ({Delta}OM) in a structured fashion. The dataset was split into development (10%), validation (10%), and test (80%) sets stratified by clinician. Development and validation sets were used to engineer and refine prompts, and the held-out test set was used for model assessment. Five LLMs (Claude Opus 4.6, DeepSeek-V3.2, GPT 5.2, Grok 4.1, and Qwen3.6-35B-A3B) were accessed via Microsoft Azure AI Foundry within a HIPAA-compliant environment. Inter-grader agreement was assessed with Gwet AC1. LLM performance was initially assessed in a binary fashion with F1 scores, and the degree of text match among positive cases was evaluated using exact match accuracy and Jaccard Index (JI). Main Outcome Measures: F1 score, exact match accuracy, JI. Results: Gwet AC1 for intergrader agreement was 0.799, 0.888, 0.985, and 0.988 for CTM, {Delta}TM, COM, and {Delta}OM, respectively. F1 scores for CTM were 0.985, 0.971, 0.978, 0.968, and 0.970 for Claude, Deepseek, GPT, Grok, and Qwen, respectively; for {Delta}TM: 0.905, 0.826, 0.897, 0.842, 0.855, respectively; for COM: 0.923, 0.887, 0.899, 0.906, 0.894, respectively; for {Delta}OM: 0.958, 0.815, 0.937, 0.835, 0.940, respectively. Among positive cases, range of exact match accuracies for CTM (N=1354) was 0.730- 0.882 and range of JIs was 0.809-0.918. For {Delta}TM (N=404), exact match accuracy range was 0.619-0.780 and JI range was 0.668-0.827. For COM (N=47), exact match accuracy range was 0.766-0.872 and JI range was 0.765-0.870. For {Delta}OM (N=25), exact match accuracy range was 0.583-0.920 and JI range was 0.583-0.922. Conclusions: The GLLaucoMed pipeline demonstrated high performance in extracting and standardizing medication data from unstructured clinical notes, including both current medications and proposed changes. Claude and GPT exhibited the strongest performance.

17.
medRxiv (Medicine) 2026-06-16

Development and reliability and validity test of the Questionnaire on Knowledge, Attitude and Practice of ICU Nurses on Blood Oxygen Saturation Management in Mechanically Ventilated Patients

Objective: A questionnaire on the knowledge, attitude and practice of ICU nurses regarding the management of blood oxygen saturation in patients with mechanical ventilation was compiled, and its reliability and validity were tested. Method: Drawing upon the knowledge-attitude-practice theory, the initial questionnaire draft was developed through literature review and consultation with Delphi experts. Employing convenience sampling, 32 nurses from the General ICU of Wuxi Second People's Hospital were surveyed between 1 August 2025 and 27 September 2025, enabling item screening and assessment of reliability and validity.The full version of the developed questionnaire is provided as Supporting Information (S1 File). All items are published under a CC BY 4.0 license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Result: A questionnaire on the knowledge, attitude and practice of ICU nurses regarding the management of blood oxygen saturation in mechanically ventilated patients was finalised, comprising 26 items: 11 in the knowledge dimension, 6 in the attitude dimension and 9 in the behaviour dimension. The overall Cronbach's coefficient for the questionnaire was 0.88, with dimension-specific coefficients of 0.787, 0.722, and 0.781 respectively. The Spearman-Brown coefficient for the entire questionnaire was 0.967, while dimension-specific coefficients were 0.796, 0.666, and 0.728 respectively. The content validity index at the questionnaire level (S-CVI) was 0.886, and the item-level content validity index (I-CVI) ranged from 0.913 to 0.967. 0.728. The questionnaire's level content validity index (S-CVI) was 0.886, and the item level content validity index (I-CVI) ranged from 0.913 to 1.00. Conclusion: The questionnaire on knowledge, attitude and practice of blood oxygen saturation management in mechanically ventilated patients demonstrates good reliability and validity. It may serve as an assessment tool for intensive care unit nurses regarding their knowledge, attitude, and practices concerning blood oxygen saturation management in mechanically ventilated patients, thereby establishing a foundation for developing targeted intervention strategies in future practice.

18.
arXiv (CS.CL) 2026-06-16

REFLEX: Reflective Evolution from LLM Experience

Authors:

Large multimodal language models (LLMs) have emerged as powerful tools for guiding evolutionary search toward interpretable programmatic policies. However, existing frameworks rely on a monolithic model call to simultaneously interpret visual behavioral evidence and synthesize corrective code. This diagnosis-repair entanglement creates an opaque feedback loop, obscuring the rationale behind mutations and preventing the retention of algorithmic insights across independent runs. To achieve auditable and efficient policy search, we argue that visual diagnosis must be structurally decoupled from code generation. We present REFLEX, a train-free evolutionary framework that operationalizes this decoupling. In REFLEX, a vision-enabled Critic first distills task-specific behavioral evidence into structured, auditable diagnoses. Subsequently, a text-optimized Actor synthesizes child policies using these diagnoses alongside a persistent, self-evolving Skill Memory of reusable code snippets. This architecture not only provides transparent mutation traces but also enables cross-run programmatic knowledge transfer. Extensive evaluations across control benchmarks (Lunar Lander, Acrobot, Pendulum) and a 36-dimensional antenna array synthesis task demonstrate exceptional sample efficiency. Notably, REFLEX solves Acrobot and Pendulum in under 10 LLM calls and reaches a best Normalized Weighted Score of 1.092 on Lunar Lander, achieving highly competitive final performance while significantly accelerating the early-stage discovery of transparent policies.

19.
arXiv (CS.LG) 2026-06-15

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

arXiv:2606.14679v1 Announce Type: new Abstract: Online inventory optimization (OIO) is online convex optimization with physical memory: inventory carryover makes the feasible action set depend on the past. A natural principle, used in stochastic inventory learning and recently in OIO under a single linear capacity constraint, is to maintain a hidden target chosen by an online learner and implement its projection onto the currently feasible order-up-to set. We prove that this simple principle is optimal for OIO on arbitrary bounded convex capacity sets. With online gradient descent as the base learner, the method improves the best known regret guarantee for OIO on general convex sets from inverse to inverse-square-root dependence on the common-demand probability, and we prove a matching lower bound. The same principle gives the first polylogarithmic regret guarantee for strongly convex losses and the first dynamic regret guarantee adapting to Euclidean path variation on general convex capacity sets. The analysis introduces a norm alignment principle: the right state variable is the distance from the hidden target to the feasible set, measured in the same norm as the projection. Under norm alignment, this distance evolves pathwise as a scalar queue, with target movement as arrival and common demand as service. This reduction to one-dimensional queue control resolves the state dependence and extends the guarantees to general convex capacity sets, beyond the reach of prior productwise approaches. Experiments on synthetic and real-world inventory data corroborate the theory.

20.
arXiv (CS.CV) 2026-06-16

RQUL-UIE: Revitalizing Quality-Unstable Labels for Underwater Image Enhancement via In-Dataset Self-Supervision

Underwater Image Enhancement (UIE) is essential for mitigating degradations caused by water medium. Although learning-based methods have advanced significantly, most rely on paired datasets with unstable label quality, which bottlenecks model performance. This paper proposes a diffusion-based, in-dataset self-supervised learning strategy designed to exploit the quality distribution of training labels. Specifically, we evaluate label quality via semantic perception embeddings from a pre-trained diffusion model in a training-free manner. These quality scores are subsequently quantized into noise-level indices, guiding a multi-step denoising process for level-wise supervision. This mechanism prevents low-quality labels from degrading the model while maximizing their utility during training. Furthermore, a Fourier-based refinement network is incorporated to explicitly reconstruct high-frequency components. Extensive evaluations demonstrate that our method consistently outperforms SOTA approaches in restoration quality. The code and pre-trained model will be available once accepted in link.

21.
arXiv (CS.AI) 2026-06-16

AQ4SViT: An Automated Quantization Framework with Search Gating Policy for Compressing Spiking Vision Transformers

arXiv:2606.15523v1 Announce Type: cross Abstract: Spiking Vision Transformers (SViTs) have emerged as alternative low-power ViT models, but their large sizes hinder their deployments on resource-constrained embedded AI systems. To address this, state-of-the-art works proposed quantization techniques to compress SViT models, but their manual, human-guided approach needs a huge design time and power/energy consumption to find the appropriate quantization setting for each given network, making this approach not scalable for quantizing multiple networks. Toward this, we propose AQ4SViT, a novel automated quantization framework for SViTs that can provide quick quantization settings with good trade-offs between accuracy and memory. To achieve this, AQ4SViT employs the following key ideas: quantization search strategy that evaluates the quantization setting candidates while considering the accuracy constraint; and search gating policy that quickly evaluates and selects promising quantization candidates by leveraging membrane potential drift as a performance proxy. In the search gating policy, AQSViT employs two search algorithm variants to provide trade-off options: Greedy search, which performs fast but may lead to local optima; and Beam search, which performs slower but has better performance in finding global optima selection due to a wider search space. Experimental results show that AQ4SViT-Greedy quickly finds the appropriate quantization settings, achieving up to 6.6x faster search time and up to 82.5% memory saving compared to the state-of-the-art; while AQ4SViT-Beam further reduces the memory footprint by up to 90% compared to the state-of-the-art, but with 4.5x longer search time; all these results are obtained while maintaining high accuracy within 1.5% from the original/non-quantized models on the ImageNet dataset. These results highlight that AQ4SViT framework offers advancements toward SViT deployments on embedded AI systems.

22.
arXiv (CS.LG) 2026-06-11

Machine-learning-based multipoint optimization of fluidic injection parameters for improving nozzle performance

arXiv:2409.12707v2 Announce Type: replace-cross Abstract: Fluidic injection offers a promising solution to improve the performance of the overexpanded single expansion ramp nozzles (SERNs) during vehicle acceleration. However, determining the injection parameters that yield the best overall performance across multiple nozzle operating conditions remains a challenge. The gradient-based optimization method requires gradients of injection parameters at each design point, which can lead to high computational costs when using computational fluid dynamics (CFD) simulations. This paper uses a pretrained neural network to replace CFD during optimization, enabling quick calculation of the nozzle flow field at multiple design points. Considering the physical characteristics of the nozzle flow field, a prior-based prediction strategy is adopted to enhance the model's accuracy. In addition, the neural network's back-propagation algorithm computes gradients quickly by running the computation only once, thereby greatly reducing gradient computation time compared to the finite difference method. As a test case, the average nozzle thrust coefficient of an SERN at seven design points is optimized, resulting in a 1.14\% improvement. The time cost is greatly reduced compared with traditional optimization methods, even when the time required to establish the training database is included.

23.
arXiv (quant-ph) 2026-06-16

Reconstruction of detector error model for quantum error correction

arXiv:2606.16288v1 Announce Type: new Abstract: Fault-tolerant quantum computing fundamentally relies on the accurate characterization of circuit-level noise to optimize decoding algorithms. However, extracting complex multi-body error correlations remains challenging. Contemporary greedy inference algorithms can suffer from statistical distortion, discarding true physical mechanisms while introducing many unphysical false positives. Here, we introduce the Correlation-Analysis-based Hypergraph Reconstruction (CAHR) algorithm, a globally consistent framework to invert experimental syndrome statistics directly into discrete physical hypergraphs. By coupling exact algebraic correlation equations with a top-down concurrent-pruning strategy, CAHR recovers the fault topology without false positives for both $d=5$ rotated surface codes and dense 8-body 2D color codes in our benchmark settings. Furthermore, we show that exact continuous parameter extraction in dense codes is limited by a variance cascade, where absolute statistical variance accumulates linearly from high- to low-degree mechanisms. This motivates a two-stage inference paradigm: utilizing CAHR to extract the fault topology, followed by continuous probability optimization. This provides a practical approach for characterizing and decoding highly correlated noise in realistic quantum hardware.

24.
arXiv (CS.LG) 2026-06-16

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

arXiv:2606.15219v1 Announce Type: new Abstract: In this work, we tackle the following question: Can neural networks trained with gradient-based methods achieve the optimal computational-statistical tradeoff in learning Gaussian single-index models? Prior research has shown that any polynomial-time algorithm under the statistical query (SQ) framework requires $\Omega(d^{s^\star/2}\lor d)$ samples, where $s^\star$ is the generative exponent representing the intrinsic difficulty of learning the underlying model. However, it remains unknown whether neural networks can achieve this sample complexity. Inspired by prior techniques such as label transformation and landscape smoothing for learning single-index models, we propose a unified gradient-based algorithm for training a two-layer neural network in polynomial time. Our method is adaptable to a variety of loss and activation functions, covering a broad class of existing approaches. We show that our algorithm learns a feature representation that strongly aligns with the unknown signal $\theta^\star$, with sample complexity $\widetilde{O} (d^{s^\star/2} \lor d)$, matching the SQ lower bound up to a polylogarithmic factor for all generative exponents $s^\star\geq 1$. Furthermore, we extend our approach to the setting where $\theta^\star$ is $k$-sparse for $k = o(\sqrt{d})$ by introducing a novel weight perturbation technique that leverages the sparsity structure. We derive a corresponding SQ lower bound of order $\widetilde{\Omega}(k^{s^\star})$, matched by our method up to a polylogarithmic factor. Our framework, especially the weight perturbation technique, is of independent interest, and suggests potential gradient-based solutions to other problems such as sparse tensor PCA.

25.
medRxiv (Medicine) 2026-06-10

Longitudinal brain structural changes during clozapine treatment: associations with neuroreceptor architecture and clinical response

In treatment-resistant schizophrenia, clozapine treatment has been associated with longitudinal reductions in subcortical volumes, ventricular enlargement, and widespread cortical thinning. However, it is unknown how these structural changes relate to clozapines pharmacological profile and clinical efficacy. We combined five longitudinal datasets with MRI acquired before and on average 5 months after clozapine initiation in 143 individuals to quantify brain structural changes and their association with normative maps relating to neuroreceptor architecture and physiological systems, and improvement in symptom severity. Clozapine treatment was associated with grey matter volume reductions across multiple subcortical regions (including the amygdala, hippocampus, thalamus, caudate, putamen and nucleus accumbens), increases in pallidal volume, ventricular enlargement, and widespread cortical thinning. Cortical regions showing the greatest magnitude of thinning corresponded to areas with higher normative densities of serotonergic 5-HT1A, 5-HT2A and 5-HT4 receptors. Changes in subcortical volume or cortical thickness during clozapine treatment were not associated with changes in total or positive symptom severity. In addition, baseline subcortical volume, cortical thickness, or gyrification prior to starting clozapine did not predict subsequent symptom improvement. Cortical thinning may partly reflect clozapines activity at serotonergic receptors, which have been implicated in cortical network stabilisation and neuroplasticity, however structural remodelling during clozapine treatment may reflect a process independent from its clinical efficacy in improving core symptoms of psychosis.