Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

Towards Next-Generation Healthcare: A Survey of Medical Embodied AI for Perception, Decision-Making, and Action

Foundation models have demonstrated impressive performance in enhancing healthcare efficiency across a wide range of medical applications. Nevertheless, their limited ability to perceive, understand, and interact with the physical world significantly constrains their effectiveness in real-world clinical workflows, where safety-critical decision-making and physical execution are tightly coupled. Recently, embodied artificial intelligence (AI) has emerged as a promising physical-interactive paradigm for intelligent healthcare, enabling agents to operate in complex medical environments. As research in this area rapidly expands, understanding how intelligent agents function as integrated, end-to-end systems in clinical environments becomes increasingly critical. However, existing surveys on medical embodied AI largely emphasize individual aspects or functional components, lacking a unified system-level organization of the field. To support and consolidate recent advances, we systematically survey the core components of medical embodied AI, with a particular emphasis on the coordinated integration of perception, decision-making, and action. We further review representative medical applications and relevant datasets, and we analyze the major challenges encountered in real-world clinical practice. Finally, we discuss key directions for future research in this rapidly evolving field. The associated project can be found at https://github.com/VMVLab/Medical_Embodied_AI_Paper_List.

02.
arXiv (CS.CV) 2026-06-12

YOLO-AMC: An Improved YOLO Architecture with Attention Mechanisms for Building Crack Detection

Crack detection plays an important role in infrastructure inspection and Structural Health Monitoring (SHM). However, cracks typically appear as thin, low-contrast structures and are easily affected by background noise, posing challenges for existing object detection models. This study proposes an improved YOLO-based architecture with integrated attention mechanisms, termed YOLO-AMC (YOLO with Attention Mechanisms for Crack Detection), to enhance automated crack detection performance. Based on YOLOv11, the original C2PSA module is removed, and multiple attention mechanisms, including Global Attention Mechanism (GAM), Residual Convolutional Block Attention Module (Res-CBAM), and Shuffle Attention (SA), are introduced into the multi-scale feature fusion layers of the Neck to strengthen cross-scale feature integration. Experimental results demonstrate that YOLO-AMC consistently outperforms baseline models YOLOv11n and YOLOv8n across multiple evaluation metrics. Among the evaluated attention modules, GAM achieves the best detection performance, obtaining mAP@0.5 = 0.9917 and mAP@0.5:0.95 = 0.9506 on the test dataset, which are higher than those of YOLOv11 (0.9833 / 0.9112) and YOLOv8 (0.9707 / 0.8921). Furthermore, while maintaining a computational complexity of 7.6 GFLOPs, the proposed model achieves 110.95 FPS on an NVIDIA RTX 4090 platform and approximately 5 FPS on a Raspberry Pi 5 edge device, demonstrating a favorable trade-off between accuracy and deployment efficiency. The implementation code for this study is available on GitHub at https://github.com/CY-Tsai24/YOLO-AMC.

03.
arXiv (CS.LG) 2026-06-19

Statistical Properties of Training & Generalization

arXiv:2606.20299v1 Announce Type: cross Abstract: Deep learning has managed to evade numerous intuitions from classical statistics to achieve unprecedented performance on a number of real-world tasks. In this article, we investigate the key features and surprises of deep learning from a physics-informed perspective, taking care to point out and justify where possible the many choices inherent in constructing a deep learning model. In particular, we review the phenomenon of neural scaling laws and discuss their interplay with the constraints and inductive biases which may be present when applying machine learning to problems in physics.

04.
arXiv (CS.AI) 2026-06-15

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

arXiv:2606.14249v1 Announce Type: new Abstract: AI agent performance depends critically on the runtime harness, comprising the prompts, tools, memory, and control flow that mediate how a model observes, reasons, and acts. Yet today's harnesses remain largely hand-crafted and static: each new model or task still demands bespoke scaffolding, and the rich traces produced during execution are rarely distilled back into systematic improvement. We introduce HarnessX, a foundry for composable, adaptive, and evolvable agent harnesses. HarnessX assembles typed harness primitives via a substitution algebra, adapts them through AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning, and closes the harness-model loop by turning trajectories into both harness updates and model training signal. Across five benchmarks (ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified), HarnessX yields an average gain of +14.5% (up to +44.0%), with gains largest where baselines are lowest. These results suggest that agent progress need not come from model scaling alone: composing and evolving runtime interfaces from execution feedback is an actionable and complementary lever. The complete codebase will be open-sourced in a future release.

05.
arXiv (quant-ph) 2026-06-16

Hyperinvariant Spin Network States – An AdS/CFT Model from First Principles

arXiv:2510.06602v2 Announce Type: replace Abstract: We study the existence and limitations of hyperinvariant tensor networks incorporating a local SU(2) symmetry. As discrete implementations of the anti de-Sitter/conformal field theory (AdS/CFT) correspondence, such networks have created bridges between the fields of quantum information theory and quantum gravity. Adding SU(2) symmetry to the tensor network allows a direct connection to spin network states, a basis of the kinematic Hilbert space of loop quantum gravity (LQG). We consider a particular situation where the states can be interpreted as kinematic quantum states for three-dimensional quantum gravity. We show that important aspects of the AdS/CFT correspondence are realized in certain quantum states of the gravitational field in LQG, thus justifying, from first principles, a class of models introduced by [F. Pastawski et al., JHEP 06, 149 (2015)]. We provide examples of hyperinvariant tensor networks, but also prove constraints on their existence in the form of no-go theorems that exclude absolutely maximally entangled states as well as general holographic codes from local SU(2)-invariance. We calculate surface areas as expectation values of the LQG area operator and discuss further possible constraints as a consequence of a decay of correlations on the boundary.

06.
arXiv (CS.CL) 2026-06-16

SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustness to paragraph-level paraphrasing remains difficult because such attacks globally disrupt watermark signals by changing sentence order. In this work, we propose SAMark, a self-anchored watermarking framework that removes the dependency on sentence order by establishing a step-independent green region in semantic space. To improve detectability, we introduce a multi-channel hyperbolic scoring mechanism that amplifies watermark signals while suppressing noise from weakly aligned candidates. We further propose a diversity-aware filtering strategy that combines hard filtering with soft regularization, extending beyond simple n-gram repetition filters to address semantic redundancy. Experimental results show that SAMark achieves up to 90.2% TP@FP1% under typical paragraph-level paraphrasing attacks, outperforming the strongest prior baseline by more than 30% on average, while maintaining generation quality competitive with unwatermarked text and breaking the robustness-quality trade-off that limits prior methods.

07.
arXiv (CS.CL) 2026-06-12

Trait, Not State: The Durability of Reading Identity in Social Highlighting

Prior work on a social web highlighter located individuality in selection – which documents a person chooses to highlight – but measured it cross-sectionally. We ask the temporal question: is a reader's selection signature a trait or a state? We freeze each reader's first six months of highlighting as a profile and track its own-vs-other advantage on their later selections at growing gaps (to 24+ months), with negatives drawn from the same calendar era – so supply drift cannot masquerade as personal drift – at a coarse global level and at a fine level whose negatives and controls come from the reader's own interest neighborhood; the anchor cell reproduces the prior cross-sectional level (+0.188 vs +0.169), validating the harness. Four results. Within the same users, the fine-layer advantage shows no statistically detectable paired decline at any horizon (6-12 month retention R = 1.00 [0.85, 1.18], n = 212; the farthest bin is compatible with a modest decline; the only contrast whose interval excludes zero is the coarse layer at 12-24 months, about 13%). The signal is not reducible to repeated domains (~90% survives excluding all profile sources). Within-person drift is slow (a recent-half profile beats the old half by +0.042). Prospectively, personal profiles – even one built from a reader's earliest documents, median 20 months before evaluation – rank their next reads at roughly 3x the AP of every simple non-personal prior tested. We use "trait" operationally (a stable signature under continued engagement); the scope is heavy, long-tenured readers of one platform, and exposure is not separable from choice.

08.
arXiv (CS.LG) 2026-06-17

Price of metric universality in vector quantization is at most 0.11 bit

arXiv:2602.05790v2 Announce Type: replace-cross Abstract: Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (``weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as ``waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

09.
bioRxiv (Bioinfo) 2026-06-16

Programmatic access to ICTV virus taxonomy through a public ontology API

The International Committee on Taxonomy of Viruses (ICTV) is responsible for developing and maintaining a universal virus taxonomy. As the reference framework for organising the viral world, it is essential for virology and related fields. Despite its widespread use in research and public health, programmatic access to ICTV taxonomy has remained limited, posing challenges for integration, versioning, and interoperability across databases and bioinformatics resources requiring up-to-date virus taxonomy. To address this, we developed a public and sustainable solution leveraging ontology-based APIs. Successive ICTV Master Species List (MSL) releases were transformed into a structured ontology and deployed as a unified representation through the Ontology Lookup Service (OLS). The framework also provides ICTV-NCBI mappings and helper libraries for integration into downstream systems. This enables, for the first time, public programmatic retrieval of current and historical virological taxon names, taxonomic relationships, metadata, and persistent identifiers through stable endpoints. More broadly, this work illustrates a general strategy for transforming structured biological datasets into semantically enriched graph resources exposed through scalable public APIs. These developments enhance interoperability, reduce manual curation, and support FAIR-aligned taxonomic data management in virology and pandemic preparedness.

10.
arXiv (CS.AI) 2026-06-19

VCG: A Multimodal Retrieval Framework for E-Commerce Video Feeds under Extreme Cold-Start Conditions

arXiv:2606.19627v1 Announce Type: cross Abstract: The digital commerce landscape is shifting from static, search-driven catalogs to dynamic, immersive video feeds. This transition introduces an ``extreme cold-start'' problem: unlike traditional items, new short-form videos lack the dense interaction history required for collaborative filtering. Furthermore, immersive feeds introduce strong position and duration biases that distort standard engagement signals. In this paper, we demonstrate the Video Candidate Generation (VCG) system, a scalable multimodal retrieval engine designed to solve these challenges in a large-scale e-commerce environment. By leveraging a domain-adapted vision-language model (based on CLIP), we map users and videos into a shared semantic space, enabling zero-shot retrieval based on visual content rather than behavioral history. We detail the system's architecture and present a rigorous evaluation comparing generative (LLM) vs. discriminative (CLIP) embeddings. Our results show that while generative models excel at attribute prediction, they suffer from embedding space collapse in retrieval tasks. Online A/B testing demonstrates that VCG effectively mitigates engagement biases, yielding a 50\% uplift in deep video completion. To showcase the system's capabilities, we present an interactive demonstration featuring three bi-directional retrieval scenarios: Product-to-Video, Video-to-Product, and Zero-Shot Semantic Search.

11.
arXiv (CS.AI) 2026-06-15

STREAM: Multi-Tier LLM Inference Middleware with Dual-Channel HPC Token Streaming

arXiv:2606.13968v1 Announce Type: cross Abstract: Researchers and practitioners working with large language models face a fragmented landscape: local models are free and private but hardware limits the model size and context windows a researcher can use; institutional HPC centers offer powerful GPU resources at no marginal cost and keep data within institutional boundaries, but operate behind firewalls and are designed for batch jobs rather than interactive use; commercial cloud APIs provide frontier-model quality on demand but impose significant cost and data retention policies unsuitable for sensitive research data. No existing system unifies all three. STREAM (Smart Tiered Routing Engine for AI Models) addresses this gap with four contributions: (1) a three-tier routing architecture combining local, HPC, and cloud inference with a local LLM-based complexity judge; (2) a dual-channel HPC streaming architecture that separates the Globus Compute control plane (authentication and job dispatch) from a WebSocket relay data plane (token delivery), enabling sub-second TTFT (0.54 s median, 21.1x over batch mode's 11.40 s) through institutional firewalls without VPN or firewall rule changes, with end-to-end AES-256-GCM encryption ensuring the relay operator cannot read token payloads; (3) tier-aware context summarization that prevents long conversations from forcing simple queries onto expensive tiers; and (4) an HPC-as-API proxy mode that exposes HPC inference as an OpenAI-compatible endpoint callable from any standard client with no HPC expertise, a deployment pattern made practical only by the sub-second TTFT of contribution (2). Llama 3.2 3B achieves 85.1% free-tier retention on a 1,200-query benchmark spanning ten domains. Measured TTFT: 0.26 s local, 0.54 s HPC (relay), 1.68 s cloud.

12.
arXiv (CS.LG) 2026-06-16

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

arXiv:2605.01961v2 Announce Type: replace Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human evaluators, which, under large variations of preferences, can be unfair to minority groups. In this work, we consider fairness in dueling bandits, a standard framework for online learning from preference data. We assume that each user has a (potentially distinct) Condorcet winner, which is an arm preferred to every other arm. Using these user-specific Condorcet winners as reference points, we evaluate and score arms according to their performance relative to the corresponding winner. To promote fairness across heterogeneous users, we adopt the well-established Nash Social Welfare objective, which maximizes the product of user utilities, thereby inherently penalizing inequality and preventing the marginalization of any single user. Within this framework, we construct a hard instance to establish a regret lower bound of $\Omega(T^{2/3}\min(K,D)^\frac{1}{3})$ for a time horizon $T$, $K$ arms, and $D$ users, which, to the best of our knowledge, is the first result quantifying the cost of fairness in dueling bandits with heterogeneous preferences. We then present the Fair-Explore-Then-Commit and Fair-$\epsilon$-Greedy algorithms with a Condorcet winner identification phase. We further derive their regret upper bounds that match the lower-bound dependence on $T$ up to logarithmic factors.

13.
arXiv (quant-ph) 2026-06-16

High-dimensional coherence to entanglement transduction under canonical noise

arXiv:2606.16695v1 Announce Type: new Abstract: We develop an analytical framework for coherence-to-entanglement conversion in bipartite high-dimensional quantum systems, so-called qunits. An arbitrary coherent input qunit is coupled to an incoherent ancilla through a generalized controlled-shift operation, producing a maximally correlated bipartite state. By analyzing the partial transpose of the output state, we establish an exact dimension-independent connection between the input coherence and the generated entanglement. We then study how this conversion is affected by three standard noise processes applied after the conversion step: phase damping, global depolarizing noise, and independent amplitude damping. The resulting expressions show that these channels degrade entanglement in qualitatively different ways. Phase damping leads to a uniform attenuation of the entanglement generated from coherence, depolarizing noise introduces pairwise thresholds associated with entanglement sudden death, and amplitude damping produces an asymmetric decay governed by relaxation toward the ground state. For maximally coherent inputs, the general results reduce to simple closed-form behavior, allowing direct comparison of the three noise mechanisms as the system dimension increases. In particular, global depolarizing noise exhibits a dimension-dependent sudden-death threshold, while amplitude damping leads to a smooth suppression in the maximally coherent case. These results provide useful analytical benchmarks for high-dimensional resource conversion and for assessing noisy entanglement generation in qudit-based quantum-information settings.

14.
PLOS Medicine 2026-05-29

Characterization of the VHH-Fc construct rimteravimab in healthy adults and patients hospitalized for mild-to-moderate COVID-19: Two Phase 1 randomized clinical trials

作者:

by Ellen Jansen, Viki Bockstal, Florence Herschke, Per Olsson Gisleskog, Manuela Rinaldi, Angélique Boerboom, Salah Hadi, Natalia Gaibu, Michel Moutschen, Dominique Tersago Background Variable Heavy domain of Heavy chains (VHH) are innovative tools to target unique epitopes, yet few have been developed as heavy chain-only antibodies for clinical use. Rimteravimab (referred to here as XVR011) is a humanized antibody developed for the treatment of mild-to-moderate coronavirus disease 2019 (COVID-19), consisting of two identical VHHs targeting the receptor binding domain (RBD) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike, with a human immunoglobulin (Ig) G1 fragment constant of antibody (Fc), silenced for Fc effector functions. We conducted two Phase 1 studies in healthy volunteers or hospitalized COVID-19 patients to evaluate its safety, tolerability, pharmacokinetics and immunogenicity. Methods and findings A randomized, double-blinded, single-center, placebo-controlled, single ascending dose study was performed in healthy volunteers (Phase 1a, EXEVIR0102, EudraCT 2021-003707-17), in parallel to an open-label, multi-center, single ascending dose study in patients hospitalized for mild to moderate COVID-19 (Phase 1b, EXEVIR0101, EudraCT 2020-005299-36, NCT04884295). Participants received a single intravenous infusion of 250, 500 or 1,000 mg of XVR011. The primary objective for both trials was the safety and tolerability of XVR011. Pharmacokinetics were evaluated as a secondary objective in Phase 1a and as an exploratory objective in Phase 1b. Efficacy (evaluated as respiratory parameters and COVID-19 clinical status) and antiviral activity in patients were evaluated as a secondary objective in Phase 1b. Immunogenicity was evaluated as an exploratory objective. Part 2 of the EXEVIR0101 study (initially a phase 1b/2 study) was not conducted due to the loss of XVR011 potency against SARS-CoV-2 Omicron BA.2. Demographics, safety, efficacy, and immunogenicity were analyzed using descriptive statistics, while pharmacokinetics were analyzed with noncompartmental pharmacokinetics (PK) modeling.In the Phase 1a study, there were no infusion-related reactions, serious treatment-emergent adverse events (TEAEs) or TEAEs grade ≥3. 22/30 volunteers (73.3%) reported 53 TEAEs (49 Grade 1, 4 Grade 2) with none being related to XVR011. The most common TEAE was headache (n = 8, 26.7%) in various treatment groups. In the Phase 1b study, 27 hospitalized patients were enrolled, and followed up to 30 days. Seven patients (25.9%) reported a total of 15 TEAEs, the majority (80%) being mild to moderate (Grade 1–2). There were no treatment-related serious TEAEs. All TEAEs resolved by the end of the study. Peak exposure (maximal concentration, Cmax) and systemic exposure (area under the curve, AUC0-t, and AUC0-inf) for XVR011 increased dose-proportionally. Geomean half-life ranged from 15.4 to 17.0 days in Phase 1a, while individual half-life ranged from 11.4 to 15.6 days in Phase 1b. SARS-CoV-2 viral load, as detected in nasopharyngeal samples by reverse transcription and quantitative polymerase chain reaction (RT-qPCR), decreased similarly in all cohorts compared to baseline. No treatment-induced anti-drug antibodies (ADA) were detected in Phase 1a. In Phase 1b, higher XVR011 concentrations increased the likelihood of ADA formation, without impacting pharmacokinetics and pharmacodynamics. No obvious dose-response in COVID-19 clinical status or respiratory parameters was observed.Technological limitations included study size, absence of placebo for the Phase 1b, absence of repeated dosing, evolving SARS-CoV-2 variants and standard-of-care. Conclusions XVR011 displayed a favourable safety, tolerability, pharmacokinetics, and immunogenicity profile, both in healthy volunteers and in patients hospitalized for mild to moderate COVID-19. These data pave the way for the design and clinical development of VHH-Fc constructs.

15.
arXiv (CS.CV) 2026-06-16

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

作者:

Approximate nearest-neighbour search underpins large-scale retrieval and retrieval-augmented generation, yet its methods are studied in communities that seldom read one another. We argue that they form one field with three design choices. We develop the projection-quantisation-organisation lens: every method places its projections, places its quantisation thresholds, and organises the resulting codes for search. We test the lens with a reproducible measurement, released as the open BitBudget benchmark, and report three findings. First, the quantisation axis delivers the largest memory savings: a one-bit code with full-precision re-ranking matches uncompressed quality for six of seven embedders, the scanned code one thirty-second of the float's size. Second, the orderings the lens anticipates, including a learned-embedding regime where binary codes overtake an inverted-file product quantiser at a matched byte budget, recur as the embedding is enlarged. Third, given class labels, an eight-byte supervised code more than doubles the retrieval quality of the two-kilobyte task-agnostic float it replaces. We also recast the semantic identifiers of generative retrieval as quantisation codes. The main contribution is a single, tested account of compact-code search, from random projections to the retrieval-augmented era.

16.
arXiv (CS.LG) 2026-06-15

Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits Tokens

arXiv:2606.14620v1 Announce Type: new Abstract: Open diffusion language models are marketed as parallel, non-autoregressive decoders, yet the order in which a shipped checkpoint actually commits its tokens is almost never measured. We instrument DiffusionGemma 26B, a masked discrete-diffusion mixture-of-experts model built on Gemma 4, hooking its sampler's accept step to record which canvas positions commit, when, and at what confidence. Across a 686-prompt, six-regime probe suite we find that its decoding is neither parallel nor block-autoregressive: it follows a partial left-to-right commit bias whose apparent strength depends almost entirely on the granularity at which you look. Order is weak token by token and strengthens smoothly as the analysis is coarsened, so the model's "block size" turns out to be an artifact of the measuring ruler rather than the architecture. The model commits in large simultaneous batches, leaving much of the within-batch order genuinely undefined rather than merely unobserved. The behaviour is regime-dependent: structured JSON is committed in essentially arbitrary order, and a position's commit confidence tracks correctness on mathematical reasoning but carries no signal on factual recall. Commitment is aggressive, finishing in a short late burst well inside the step budget, while task accuracy matches the model's autoregressive Gemma-4 sibling. Beyond these findings, our central contribution is methodological: measuring decoding order honestly demands handling trailing-EOS padding, within-regime confounding, commit non-monotonicity, block-size sensitivity, and large commit-batch ties, each of which can otherwise manufacture a decoding-order result that is not really there.

17.
arXiv (CS.LG) 2026-06-12

From geometry to dynamics: Learning overdamped Langevin dynamics from sparse observations with geometric constraints

arXiv:2512.23566v2 Announce Type: replace-cross Abstract: How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequency observations, or rely on geometric arguments that apply only to conservative systems, limiting the range of dynamics they can recover. Here, we present a new framework that reconciles these two perspectives by reformulating inference as a stochastic control problem. Our method uses geometry-driven path augmentation, guided by the geometry in the system's invariant density to reconstruct likely trajectories and infer the underlying dynamics without assuming specific parametric models. Applied to overdamped Langevin systems, our approach accurately recovers stochastic dynamics even from extremely undersampled data, outperforming existing methods in synthetic benchmarks. This work demonstrates the effectiveness of incorporating geometric inductive biases into stochastic system identification methods.

18.
arXiv (CS.LG) 2026-06-19

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

arXiv:2605.31158v3 Announce Type: replace-cross Abstract: Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework for interactive video world models. Our key insight is that interaction naturally enables trajectory-dependent adaptive computation: retrieved spatial memory can be discarded during novel exploration, temporal context can be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management, denoising cache acceleration, and hardware-software co-designed 3D block sparse attention with fused Triton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.

19.
arXiv (CS.AI) 2026-06-12

Mental-R1: Aligning LLM Reasoning for Mental Health Assessment

arXiv:2606.13176v1 Announce Type: new Abstract: Mental health problems such as anxiety, depression, and suicide remain urgent global challenges, where timely and accurate assessment is critical for effective intervention. Recently, large language models have been explored for mental health assessment. However, existing general-purpose post-training methods do not align with the cognitive processes of human assessment, which may lead to unreliable reasoning outcomes. To bridge this gap, we propose Cognitive Relative Policy Optimization (CRPO), a reinforcement learning framework tailored for the mental health domain. CRPO extends group relative policy optimization by integrating stage-dependent uncertainty modeling into the policy optimization process. Specifically, we introduce a stage-wise entropy regularization mechanism that encourages broad exploration in early reasoning phases and progressively enforces confident decision-making in later stages, mimicking the human cognitive shift from uncertainty to certainty. In addition, inspired by cognitive appraisal theory, we formalize cognitive reasoning stages, thereby guiding theory-grounded interpretable inference. Experiments on 8 mental health datasets show that CRPO achieves an average improvement of 10.4 percentage points in weighted F1-score over the best reinforcement learning baseline. Furthermore, the CRPO-trained model Mental-R1 demonstrates clear advantages compared with existing large language models on reasoning-intensive cases, suggesting that CRPO enhances reasoning capabilities for mental health assessment.

20.
arXiv (CS.CV) 2026-06-18

Spatially Stratified Distillation for Heterogeneous Radar Place Recognition

Scalable, all-weather place recognition increasingly relies on heterogeneous radar place recognition to bridge diverse hardware platforms. A notable application is matching queries from cost-effective 4D automotive radars against high-fidelity reference maps built by dense spinning radars. This process is fundamentally limited by the extreme sparsity (and narrow field-of-view) of the 4D sensor, which captures only a fraction of the structural density present in the spinning radar database. Prior efforts address this issue by unifying different radar signals. That is, projecting both signals into a common representational space. Yet, they suffer performance degradation in multi-session environments. In this paper, we propose spatially-stratified distillation (SSD); a strategy that replaces standard uniform distillation with an asymmetric spatial alignment derived directly from physical radar returns. In regions where both radars exhibit overlapping returns, SSD enforces strong feature alignment. Crucially, in sparse regions where the 4D student lacks returns but the teacher contains valid structure within the shared field of view, SSD applies heavily discounted distillation weights. Extensive evaluations of the recent HeRCULES dataset demonstrate that SSD significantly outperforms prior place recognition methods, achieving state-of-the-art results on its challenging dynamic sequences.

21.
arXiv (quant-ph) 2026-06-16

Compressed Qubit Noise Spectroscopy: Piecewise-Linear Modeling and Rademacher Measurements

arXiv:2601.02516v2 Announce Type: replace Abstract: Random pulse sequences are a powerful method for qubit noise spectroscopy, enabling efficient reconstruction of sparse noise spectra. Here, we advance this method in two complementary directions. First, we extend the method using a regularizer based on the total generalized variation (TGV) norm, in order to reconstruct a larger class of noise spectra, namely piecewise-linear noise spectra, which more realistically model many physical systems. We show through numerical simulations that the new method resolves finer spectral features, while maintaining an order-of-magnitude speedup over conventional approaches to noise spectroscopy. Second, we simplify the experimental implementation of the method, by introducing Rademacher measurements for reconstructing sparse noise spectra. These measurements use pseudorandom pulse sequences that can be generated in real time from a short random seed, reducing experimental complexity without compromising reconstruction accuracy. Together, these developments broaden the reach of random pulse sequences for accurate and efficient noise characterization in realistic quantum systems.

22.
arXiv (CS.AI) 2026-06-18

SwitchBraidNet: Quantisation-Aware Lightweight Architecture for Hybrid Brain-Computer Interface

arXiv:2606.18816v1 Announce Type: cross Abstract: Hybrid brain-computer interfaces (BCIs) that integrate motor imagery (MI) and steady-state visual evoked potentials (SSVEP) provide high-dimensional neural decoding but typically exceed the computational limits of embedded hardware. To address this, we propose SwitchBraidNet, a compact EEG classification architecture designed for low-power deployment. The model employs a dual-path temporal braid to extract multiscale oscillatory features, an adaptive squeeze-and-excitation spatial switch for electrode gating, and a log-variance readout layer for direct band-power encoding. Furthermore, through systematic quantisation-aware training on the OpenBMI dataset, we compared SwitchBraidNet against four established baselines across FP32, FP16, and INT8 precisions. Experimental results demonstrate superior efficiency and performance, achieving MI accuracy of 69.49% (FP16), SSVEP accuracy of 93.48% (FP32), and a hybrid information transfer rate of 64.82 bits/min (FP16). With an INT8 footprint of only 3.03 KB, SwitchBraidNet maintains high accuracy across varying numerical precisions, demonstrating its suitability for low-power embedded BCI deployment.

23.
arXiv (CS.CL) 2026-06-11

When Roleplaying, Do Models Believe What They Say?

Language models can state that "the Earth orbits the Sun" and, when role-playing Aristotle, assert the opposite. Recent work argues that persona adoption is fundamental to how language models operate, with models constantly selecting the most appropriate persona for a given context. Does such role-playing merely change the model's outputs, or does it also affect what the model internally represents as truthful? We study this question with linear truth probes, applying them to LLMs role-playing historical personas whose likely beliefs differ from modern consensus. For each persona, we compare false claims the persona would likely have endorsed (*era-believed*) with topic-matched false claims they would not have endorsed (*era-false*). Across prompting, in-context learning, and supervised fine-tuning, persona induction suppresses era-believed statements less than equally false alternatives, yet they remain classified as false overall. Role-play therefore shifts what these models say more than what they internally represent as true. We contrast this with models trained on harmful advice that exhibit Emergent Misalignment (EM). Across three model families (Qwen 2.5 14B, Qwen 3 8B, and Llama 3.3 70B), their false claims move substantially toward the true region of probe space, are defended under challenge roughly half the time versus about a sixth for role-play, and are used in downstream reasoning. Role-play and Emergent Misalignment thus are points on a spectrum of belief internalization, where role-play changes what a model says with little representational change, while Emergent Misalignment shifts the internal representation of false claims without fully marking them as true.

24.
medRxiv (Medicine) 2026-06-22

Study protocol: Feasibility and clinical implications of real-time cerebral autoregulation monitoring in major noncardiac surgery with the Medtronic Cotrending algorithm (AUTOREGULATE-NONCARDIAC-COTRENDING)

Background: Perioperative hypotension is associated with postoperative organ injury. However, trials of hypotension avoidance have not found meaningful improvements in postoperative cardiovascular, renal, neurological or functional outcomes. One possible explanation is that organ perfusion depends on patients individual autoregulatory ranges. Hence, technology enabling monitoring of the autoregulatory status of vital organs, e.g. the brain, could provide a physiologic basis for personalising of blood pressure targets. However, current established methodologies for monitoring cerebral autoregulation in noncardiac surgery, e.g. the cerebral oximetry index (COx), are limited by performance and usability. The Medtronic Cotrending algorithm has been developed to provide automated, near real-time assessment of cerebral autoregulation. While feasibility was demonstrated in cardiac surgery, its applicability in major noncardiac surgery remains unknown. This study aims to evaluate the technical feasibility and clinical implications of Cotrending-based cerebral autoregulation monitoring in major noncardiac surgery. Objectives: Primary objective: To evaluate the technical feasibility of using the Medtronic Cotrending algorithm to monitor intraoperative cerebral autoregulation in real-time during major noncardiac surgery, drawing comparisons to the COx algorithm. Secondary objectives: to investigate the potential clinical implications of Cotrending-based cerebral autoregulation monitoring. Design: Single-centre, prospective cohort study. Setting: Swiss tertiary care centre Patients: Patients enrolled in AUTOREGULATE-NONCARDIAC who were monitored intraoperatively with the Medtronic INVOS(TM) 5100 near-infrared spectroscopy (NIRS) system. Outcomes: Technical feasibility outcomes include success rate of determination of the lower limit of cerebral autoregulation, intraoperative uptime, time to first estimate of the lower limit of cerebral autoregulation, sensitivity to external factors and to data artefacts; agreement of Cotrending-derived lower limit of cerebral autoregulation with COx-derived lower limit of cerebral autoregulation. Conclusions: N/A Trial registration: Clinicaltrials.gov NCT07630129

25.
arXiv (CS.CV) 2026-06-11

How Auxiliary Reasoning Unleashes GUI Grounding in VLMs

Graphical user interface (GUI) grounding is a fundamental task for building GUI agents. However, general vision-language models (VLMs) struggle with this task due to a lack of specific optimization. We identify a key gap in this paper: while VLMs exhibit significant latent grounding potential, as demonstrated by their performance measured by Pointing Game, they underperform when tasked with outputting explicit coordinates. To address this discrepancy and bypass the high data and annotation costs of current fine-tuning approaches, we propose three zero-shot auxiliary reasoning methods. By providing explicit spatial cues such as axes, grids and labeled intersections as part of the input image, these methods enable VLMs to better articulate their implicit spatial understanding capabilities. We evaluate these methods on four GUI grounding benchmarks across seven open-source and proprietary VLMs. Experimental results show substantial gains from auxiliary reasoning. Mark-Grid Scaffold boosts Gemini-3.1-Pro from 11.72\% under direct inference to 95.20\% on ScreenSpot-v2, achieves state-of-the-art performance on ScreenSpot, and approaches the strongest fine-tuned methods on ScreenSpot-v2 and UI-I2E-Bench. Our code is available at https://github.com/liweim/AuxiliaryReasoning.