Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression

arXiv:2606.15157v1 Announce Type: cross Abstract: KV cache compression is essential for reducing the memory cost of long-context large language model inference. Existing approaches, however, typically apply a single compression policy and a uniform cache budget across all transformer layers. This uniform design ignores the fact that different layers can play different roles during prefill and decoding, and may therefore require different eviction strategies and cache capacities. We present PolyKV, a layer-wise KV cache optimization framework that considers design space with method selection and budget allocation. PolyKV routes each layer to a suitable KV compression policy based on layer-level signals, while assigning non-uniform budgets under a fixed total budget. This formulation enables heterogeneous compositions of existing KV cache methods. Experiments on LLaMA-3.1-8B and Qwen3-8B show that, under the same 512-token average KV budget, PolyKV recovers 54.5% and 25.7% of the LongBench performance gap between the strongest single-policy baseline and FullKV, respectively. Across 128-1024 budget sweep, PolyKV consistently improves over the strongest baseline by 1.7%-6.4%, corresponding to 40.0%-54.5% recovery of the FullKV gap.

02.
medRxiv (Medicine) 2026-06-22

Associations of Chemical Exposures with Psychological Distress and Depression Diagnosis among Waste Pickers in Brasilia, Brazil: A Cross-Sectional Study

Introduction: Waste pickers face chemical exposures. We evaluated whether chemical exposure is associated with psychological distress and depression. Methods: A 2017 cross-sectional survey included 1,141 waste pickers working in the Estrutural open dump in Brasilia, Brazil. Participants self-reported occupational exposure to 11 chemical categories, 17 psychological distress symptoms, and depression diagnoses. Associations of chemical exposure with mean psychological distress scores and depression prevalence were assessed, adjusted for age, sex, marital status, and income. Results: Mean psychological distress score was higher among those exposed to any chemical (mean of 8.1 vs 6.1; adjusted mean difference [aMD]: 1.8 [0.9, 2.7]) and higher among those exposed to each of 11 chemical categories, for example, smoke (aMD: 1.2 [0.6, 1.7]), batteries (aMD: 1.5 [1.0, 1.9], and oils (aMD: 1.3 [0.9, 1.8]). Depression was more prevalent among those exposed to oils (16.6% vs 10.6%; adjusted prevalence difference [aPD]: 6.3% [95% CI: 2.3, 10.2]), cleaning products (aPD: 5.4% [1.2, 9.5]), medications (aPD: 4.7% [0.6, 8.8]), and aerosols (aPD: 5.3% [1.3, 9.3]) but, not smoke, batteries, greases, insecticides, solvents, paints, chemical containers, or any chemical. Conclusion: These associations highlight the need to consider policy level protections for waste pickers to reduce chemical exposure and guard against psychological distress. Further research is necessary to explore which specific chemicals, within broad chemical categories, are associated with psychological distress and depression.

03.
arXiv (quant-ph) 2026-06-16

Programmable Gauge-Field Textures with Ultracold Atoms in Momentum Space

arXiv:2606.15124v1 Announce Type: cross Abstract: Synthetic gauge fields with ultracold atoms offer a route to quantum matter in which electromagnetic environments can be designed rather than merely imposed. While the Harper-Hofstadter model has been realized in several cold-atom systems, existing implementations are largely limited to spatially uniform magnetic fluxes. Here we experimentally realize a highly programmable two-dimensional momentum-state lattice of ultracold atoms with local control over the Peierls phase pattern, enabling direct implementation of Harper-Hofstadter Hamiltonians with tunable and spatially structured synthetic gauge fields. We observe a crossover from ballistic to strongly flux-modified bulk dynamics with suppressed transport. By introducing a synthetic electric field through site-dependent energy gradients, we further demonstrate Hall-type transverse drift arising from the interplay between electric and magnetic fields. In addition, we engineer a synthetic flux domain wall separating regions with opposite magnetic fluxes and observe anisotropic propagation guided along the interface. These results move cold-atom gauge-field engineering from uniform magnetic backgrounds toward designer gauge textures, providing an experimental setting for transport across programmable topological interfaces.

04.
arXiv (quant-ph) 2026-06-25

Variants of the Quantum Phase Operator for the Harmonic Oscillator

arXiv:2606.24913v1 Announce Type: new Abstract: We introduce and study quantum phase operators associated with the Quantum Harmonic Oscillator (QHO). We show that these operators are trace-class perturbations of the Susskind-Glogower operators and examine their mathematical and physical properties. The construction is motivated by the physically relevant two-phase case.

05.
arXiv (quant-ph) 2026-06-16

Measuring Non-Stabilizerness in an SU(2) Lattice Gauge Theory

arXiv:2606.14842v1 Announce Type: new Abstract: One of the goals of quantum simulation is to provide novel insights into quantum systems, such as the gauge theories that are relevant for high-energy and nuclear physics. Recent years have seen rapid improvements in both the hardware and software necessary for these simulations. A central consideration in the design of such simulations is the quantum complexity of a given quantum state. This work takes a step towards studying a specific kind of complexity, namely the non-stabilizerness, in a simple yet non-trivial system: SU(2) lattice gauge theory of two plaquettes. The non-stabilizerness of low-energy eigenstates is studied and the implications for quantum simulations are discussed. The real-time evolution of this system is simulated on ibm_marrakesh and the non-stabilizerness is measured using a random measurement protocol. New techniques enhancing the efficiency of this protocol are developed, including both a new way to calculate the estimator for non-stabilizerness and a flexible error mitigation technique called Bit String Decoherence Renormalization. This mitigation method is central to accurately resolving the experimental time dependence of non-stabilizerness, and is anticipated to have broad applicability in digital quantum simulations.

06.
arXiv (CS.LG) 2026-06-18

Acceleration of an algebraic multigrid pressure solver using graph neural networks

arXiv:2606.19251v1 Announce Type: cross Abstract: Solving the pressure-Poisson equation remains the primary computational bottleneck in incompressible unstructured flow solvers primarily due to the inherent sensitivity of traditional linear solvers to mesh irregularities. This work introduces a data-driven algebraic multigrid (AMG) smoother that uses a modified graph convolutional isomorphism network (GCIN). The graph neural network predicts optimal polynomial coefficients to construct a sparse pseudo-inverse operator across diverse grid topologies. The coefficients are optimized to reduce the residual after each V-cycle iteration. By directly capturing the algebraic structure of the system from the sparse coefficient matrix, the proposed method maintains the solver's linearity while adapting to local anisotropies in unstructured grids. Our framework demonstrates significant performance gains by reducing the number of V-cycles required for a given tolerance and delivering wall-clock speedups from 4% to 37% across diverse benchmarks. Notably, the model exhibits robust generalization by maintaining efficiency on meshes up to 128 times larger than those seen in training, and by accelerating the solver's convergence on unseen industry-relevant problems such as the AirfRANS dataset.

07.
arXiv (CS.LG) 2026-06-11

JGRA: Jacobian Geometry Robustness Assessment in NISQ Noise-Aware Quantum Neural Networks

arXiv:2606.09964v2 Announce Type: replace-cross Abstract: The NISQ era places stringent constraints on quantum computation, where noise and decoherence fundamentally limit performance. In classical deep learning, model robustness and resilience to perturbations are well studied: deep neural networks (DNNs) maintain high performance despite pruning, noise injection, and structural perturbations due to inherent redundancy in their representations. A central challenge in quantum machine learning is to transfer this notion of robustness to quantum neural networks (QNNs) under realistic NISQ noise. While classical deep learning exhibits robustness through structural redundancy, analogous principles for QNNs remain underdeveloped. We propose JGRA: a framework for assessing robustness in noise-aware QNNs via Jacobian geometry, capturing model sensitivity to parameter perturbations induced by noise. Our method includes entropy-matched noise calibration, noise-aware training, and noise-conditioned Jacobian extraction, yielding geometric descriptors that link clean-regime structure to noisy inference behaviour. We also empirically demonstrate that these descriptors encode predictive information about robustness under unseen noise.

08.
arXiv (CS.CL) 2026-06-15

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

Linguistics olympiad problems (LOPs) are a category of self-sufficient puzzles consisting of a scaled-down corpus representative of certain linguistic phenomena, from which the solver must deduce a primitive set of rules of the language and then translate a new set of elements. The linguistics olympiads (LOs) have become a worldwide phenomenon with 43 different territories taking part in the International Linguistics Olympiad (IOL) 2025. While the typology and solving strategies of LOPs have been analysed, their scientific facet and connections to academic linguistics have yet to be explored. LOPs are directly connected to many linguistic fields, e.g., linguistic typology, linguistic relativity, and linguistics fieldwork. Recently, LOPs have become a research focus as benchmarks for large language models, thus highlighting their usefulness in computational linguistics. Nevertheless, they have not yet been integrated into mainstream linguistics research. This paper attempts to open new directions of including this particular type of puzzle in academic research by offering a structured evaluation of LOPs as linguistic data sources and proposes criteria for their responsible use in academic research. Starting from a set of over 1800 LOPs, this study critically examines the potential of LOPs as a novel corpus for linguistics research by discussing their strengths and limitations as tools, as well as the areas of linguistics into which these problems could fit. This work forms the foundation for a broader initiative aimed at bridging the gap between LOs and academic linguistics, by establishing a robust theoretical framework for LOPs.

09.
arXiv (CS.CV) 2026-06-16

MVM-IOD: An Industrial Object-Centric Benchmark Dataset for the Evaluation of 3D Reconstruction Methods

3D object reconstruction, and camera pose estimation in industrial applications are challenging tasks, as errors are costly while the computation time is often limited. The complexity of typical industrial objects further complicates these tasks. Most of the existing datasets in this context do not depict realistic industrial scenarios. Therefore, we introduce the Machine Vision Metrology Industrial Object Dataset (MVM-IOD). Images of typical industrial objects are captured systematically, by moving a camera, mounted at the end effector of an industrial robot arm, on a hemisphere around the objects. MVM-IOD contains reference camera poses and reference 3D point clouds, the acquired RGB images of 9 objects and 2 background choices resulting in 18 scenes, which allows evaluation of all image based methods that compute a 3D reconstruction, camera poses, or novel views of a scene. Based on MVM-IOD, we extensively evaluate current SOTA 3D reconstruction and camera pose estimation methods, such as Structure from Motion, Multi-View Stereo, recent feed forward methods (Visual Geometry Grounded Transformer, {\pi}3), and 2D Gaussian Splatting and report our findings as a baseline for future research. The experiments show that capture setups like ours generate out-of distribution images for feed forward methods, leading to suboptimal point clouds and camera poses. However, these out-of-distribution images can be shifted closer to the training distribution by applying simple preprocessing steps. Consequently, in certain industrial applications, feed forward methods should be used with caution.

10.
arXiv (quant-ph) 2026-06-17

A matching decomposition algorithm for simulating quantum walk Hamiltonians

arXiv:2601.11418v3 Announce Type: replace Abstract: In this work, we present a new algorithm for generating quantum circuits that efficiently implement continuous time quantum walks on arbitrary simple sparse graphs. The algorithm, called matching decomposition, works by decomposing a continuous-time quantum walk Hamiltonian into a collection of exactly implementable Hamiltonians corresponding to matchings in the underlying graph followed by a novel graph compression algorithm that merges edges in the graph. We develop a greedy matching heuristic and a compression-aware matching heuristic, both of which can be used in the quantum circuit algorithm. Lastly, we convert the walks to a circuit and Trotterize over these components. The dynamics of the walker on each edge in the matching can be implemented in the circuit model as sequences of CX and CRx gates. We do not use Pauli decomposition when implementing walks along each matching. Furthermore, we compare greedy (compression-aware) matching decomposition to a standard Pauli-based simulation pipeline and find that greedy (compression-aware) matching decomposition consistently yields substantial resource reductions, requiring up to 43$\%$ (70\%) fewer controlled gates and up to 54$\%$ (75\%) shallower circuits than Pauli decomposition across multiple graph families. Finally, we also present examples and theoretical results for when matching decomposition can exactly simulate a continuous-time quantum walk on a graph.

11.
arXiv (CS.AI) 2026-06-19

OnDeFog: Online Decision Transformer under Frame Dropping

arXiv:2606.19721v1 Announce Type: cross Abstract: In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the performance degradation caused by frame dropping, the Decision Transformer under Random Frame Dropping (DeFog) was developed by incorporating additional mechanisms into the decision transformer to tackle frame dropping. Although DeFog can mitigate performance degradation in frame-dropping environments, since DeFog is an offline learning method, it struggles to effectively generalize to novel states not adequately represented in the training dataset. In this study, we propose OnDeFog, which integrates the mechanisms in DeFog with the online decision transformer (ODT), an online reinforcement learning method that learns policies through direct environmental interaction. Comprehensive experimental evaluation demonstrates that our proposed OnDeFog achieves superior performance compared to ODT in environments characterized by high dropping frame rate and outperforms DeFog on datasets containing a large amount of low-reward data.

12.
arXiv (CS.LG) 2026-06-16

Task-Error Residual Learning for Real-Robot Five-Ball Juggling

arXiv:2606.16978v1 Announce Type: cross Abstract: For residual learning that refines existing behavior, sample efficiency depends on two things: how much information each rollout returns, and how efficiently the learner uses that information. Reinforcement learning's standard scalar reward carries far less information than the directional task error that defines the task. Random exploration further discards whatever information each rollout returns. Through residual learning with directional task-error supervision and a task error model that drives sample selection, we achieve stable three-, four-, and five-ball juggling on anthropomorphic Barrett WAM arms. Despite planning and controlling through a simple, idealized stack, the system converges from the second attempt. The first attempt drops, after which task error decreases monotonically without further failures. In comparison, five-ball juggling typically takes humans years of practice. We compare residual learners across two ternary axes, the directional information in the learning feedback and the commitment of the analytic prior, spanning Newton-style Jacobian updates, Composite Bayesian Optimization, and stochastic search methods. Both axes prove necessary: neither directional feedback nor an informative prior suffices alone, and the simplest method that combines them, a fixed-Jacobian Newton update, is the most reliable. The learned residual tolerates substantial prior misalignment and degraded joint tracking, affecting mainly convergence speed. The bottleneck for residual learning on real robots is therefore the information content of the supervision signal and how the learner uses it, not the accuracy of the surrounding stack. Video documentation of all experiments is available at https://kai-ploeger.com/residual-juggling.

13.
arXiv (CS.CV) 2026-06-25

In-Context World Modeling for Robotic Control

Modern Vision-Language-Action (VLA) models often fail to generalize to novel setups, such as altered camera viewpoints or robot morphologies, because they are typically conditioned only on current observations and language instructions. By ignoring the underlying system configuration as a variable, these models implicitly assume a fixed execution context encountered during training, necessitating data-intensive fine-tuning for any new environment. In this work, we introduce In-Context World Modeling (ICWM), a framework that treats system identification as an in-context adaptation problem. ICWM enables robot policies to autonomously infer essential system variables from a short history of self-generated, task-agnostic interactions. Unlike traditional In-Context Learning that uses demonstrations to specify what task to perform, ICWM leverages the context window to understand how the system operates. By processing these interactions before task execution, the model implicitly captures the world dynamics of the current system, enabling adaptation to novel configurations without parameter updates. Extensive experiments in simulation and on real-world robot platforms demonstrate that ICWM significantly outperforms standard VLA baselines on novel camera viewpoints.

14.
arXiv (quant-ph) 2026-06-16

How Many Shots Are Enough for a Quantum Circuit?

arXiv:2606.16965v1 Announce Type: new Abstract: Quantum algorithms require repeated circuit executions, known as shots, to estimate output distributions accurately. Determining the minimal number of shots needed to meet a target accuracy is crucial to reduce costs and resource usage, especially on today's noisy and expensive quantum hardware. In this paper, we address the shot optimisation problem in a black-box setting, where no assumptions are made about the structure of the quantum circuit or the noise model of the backend. We introduce IncrementalExecution, a novel online framework that dynamically determines when to stop executing shots based on the principle of point of diminishing returns: the point at which additional shots no longer significantly alter the empirical distribution of a fixed circuit. The framework supports customisable policies for shot management, enabling flexible trade-offs between execution cost and result fidelity within static execution scenarios. We assess our proposal through an extensive experimental evaluation spanning 33,750 framework configurations across 180 unique static quantum circuit-backend combinations, for a total of 7.3M independent experiments. Unlike prior work that relies on problem-specific knowledge or algorithm-dependent assumptions (e.g., variational or adaptive workflows), our approach is applicable to a large set of static circuits and immediately deployable on current quantum cloud platforms.

15.
arXiv (CS.CL) 2026-06-25

Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction

End-to-end automatic speech recognition systems frequently hallucinate rare entities and domain-specific terms, especially in low-resource languages. While retrieval-augmented generation frameworks can mitigate these errors using large language models, current architectures face significant challenges. They either rely on standard sparse retrieval that ignores phonetic misrecognitions or utilize heavyweight cross-modal embeddings that introduce high latency. This letter proposes a highly efficient, purely lexical error-aware framework designed to explicitly resolve phonetic and loop hallucinations. Our approach integrates a symmetric text normalization module with a novel error-aware term frequency-inverse document frequency algorithm. By constructing a sparse diagonal penalty matrix based on historical errors, the retriever mathematically prioritizes corrective documents containing specific high-risk misrecognitions. Evaluated on the Persian subset of the FLEURS dataset, our method increased the error-aware hit rate from 53.7% to 90.9%. In end-to-end evaluations, the integrated framework reduced the final word error rate from 23.06% to 18.83%, achieving significant accuracy gains with near-zero inference latency.

16.
medRxiv (Medicine) 2026-06-24

TSPO PET binding in vivo reflects increased phagocytic microglia at post mortem in people with frontotemporal dementia

Brain inflammation is a key feature of frontotemporal dementia (FTD). TSPO PET is widely used as an in vivo proxy for neuroinflammation, but whether the elevated signal reflects microglial, astrocytic, or vascular pathology is controversial. We paired ante mortem [11C]PK11195 TSPO PET with post mortem neuropathology in 10 individuals with FTD (5 FTLD-tau, 5 FTLD-TDP) and 5 controls, combining CD68 immunohistochemistry across 17 regions, multiplex immunofluorescence pairing TSPO with microglial/macrophagic (IBA1, CD68), astrocytic (GFAP) and endothelial (CD31) markers, and three-dimensional single-cell reconstruction. CD68 burden was elevated in FTD, concentrated in white matter, and correlated with regional TSPO PET binding across pathologies ({beta} = 8.40, P < 0.001). Only the CD68-TSPO co-localised fraction tracked the PET signal, with no TSPO upregulation per-cell. The elevated TSPO PET signal in FTD likely reflects an increased burden of lysosome-enriched CD68+ microglia, supporting TSPO PET as a microglial-burden biomarker in both FTLD-tau and FTLD-TDP.

17.
arXiv (CS.LG) 2026-06-16

HawkesNest: A Multi-Axis Synthetic Benchmark for Spatiotemporal Pattern Complexity

arXiv:2606.16863v1 Announce Type: new Abstract: Evaluation of spatiotemporal point process (STPP) models relies heavily on opaque real-world datasets, where latent generative structure is unknown and model failures are difficult to attribute. We introduce HawkesNest, a generator-aligned benchmark for controlled spatiotemporal pattern complexity built on a multivariate Hawkes backbone. HawkesNest defines four complexity axes: space–time entanglement, background heterogeneity, cross-type interaction, and domain topology. Each axis is associated with a deterministic index computed from the latent data-generating mechanism. By varying these axes while holding global rate, stability, and simulation budget fixed, HawkesNest enables diagnostic stress tests of STPP models under known structural difficulty. We verify that the indices are monotone and nearly orthogonal under controlled sweeps. We illustrate its use by showing that Hawkes-family baselines degrade under joint heterogeneity–entanglement complexity, even though they are structurally aligned with the Hawkes data-generating backbone. We further show that HawkesNest exposes neural-model sensitivity: AutoSTPP remains vulnerable under isolated increases in space–time entanglement. Code. Available at https://github.com/YahyaAalaila/HawkesNest

18.
arXiv (CS.AI) 2026-06-16

Mojo: A Promising Tool for Scalable Financial AI Efficiency

作者:

arXiv:2606.16059v1 Announce Type: cross Abstract: For thirty years, quantitative finance has paid a costly two-language tax: models researched in Python are rewritten in C++ for production, often introducing numerical discrepancies. GPU-accelerated deep learning exacerbates this problem, as nondeterministic floating-point reductions can produce drift in long backtests, challenging regulatory reproducibility and auditability expectations. This article surveys Mojo, Modular's 2026 Python-like systems language, as a structural response for capital markets engineering. While closing the Python-to-C++ performance gap, Mojo uniquely combines native interoperability with the low-level systems control required to construct bit-exact deterministic kernels. Its MLIR compilation infrastructure further allows a single codebase to target scalar, SIMD, multicore, and GPU execution, reducing the translation bottleneck between research and production. We benchmark four core financial AI workloads: Monte Carlo option pricing, LLM sentiment inference, multi-asset backtesting, and portfolio Value at Risk. On Apple Silicon, Mojo demonstrates 20x to 180x speedups over pure Python on directly measured kernels; larger-scale GPU workload results are projections calibrated from published benchmarks. Alongside transparent performance data, we introduce mojo-deterministic, an open-source library of reproducible reduction kernels, and provide a candid assessment of the problems Mojo does and does not yet solve.

19.
arXiv (CS.CV) 2026-06-16

GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs

True general intelligence requires not only a model of the physical world but also a social world model: the capacity to infer how individual mental states interact and crystallize into group-level outcomes. Despite notable progress in individual-level Theory of Mind (ToM) reasoning, existing multimodal large language models fail at this broader task. Collective behavior emerges non-linearly from social tensions, conformity dynamics, and structural constraints, meaning it cannot be recovered by merely summing individual intentions. We present GroupToM-Bench, the first multimodal benchmark for group-level ToM, built around a causal chain spanning micro-level BDI states (belief, desire, intention), meso-level group tension and structural constraints, and macro-level outcome prediction and mechanistic attribution. To probe this full arc, we develop a seven-level cognitive audit framework. Experiments reveal a gap between current models and human baselines, highlighting a failure to process social structures and non-linear collective dynamics.

20.
medRxiv (Medicine) 2026-06-24

ADVISE: A Machine Learning Framework for Early Recognition of a Surrogate Marker for Ventilator-Associated Pneumonia Using Routinely Collected Critical Care Data

Background Ventilator-associated pneumonia (VAP) is the most frequent nosocomial infection in critical care, affecting 20-36% of mechanically ventilated patients. Early prediction is hampered by the absence of a reliable, objective diagnostic standard. We developed ADVISE (Automated Dudley Ventilation Infection Series Evaluation), a machine learning model to predict physiological deterioration consistent with developing VAP using routinely collected electronic health record data from a UK NHS intensive care unit. Methods Retrospective observational study of admissions at Russell's Hall Hospital ICU (2008-2026). Following National Data Opt-Out exclusion (158 admissions, 4.2%), 3,566 admissions generated 33,208 candidate 48-hour observation blocks. Six temporal variables - FiO2, ventilator mode, P:F ratio, procalcitonin (PCT), secretion amount, and secretion description - were extracted across the baseline window (hours 1-24). A composite VAP-surrogate outcome required concurrent P:F ratio decline (>=5%) and PCT rise (>=0.5 ng/mL) across the outcome window (hours 25-48). After sequential quality filters, 2,134 blocks (18 positive, 0.84% prevalence) were retained. An XGBoost classifier was trained using nested 5-fold cross-validation with scale_pos_weight=114.0 and ROC-based hyperparameter optimisation on 1,495 training blocks, evaluated on 639 held-out test blocks. Performance was assessed via AUROC, AUPRC, and calibration (Brier score). Bootstrap resampling (1,000 iterations) generated 95% confidence intervals. Results On the held-out test set (n=639, 5 positive outcomes), ADVISE achieved AUROC 0.874 [95% CI: 0.771-0.939] and AUPRC 0.031 [0.008-0.069], representing a 4.0-fold improvement over the no-skill baseline. Nested cross-validation mean AUROC was 0.844 +/- 0.078 (range 0.716-0.915). At the Youden-optimal threshold, sensitivity was 0% with specificity 97.8%, reflecting extreme class imbalance (0.78% test prevalence). A threshold targeting 80% sensitivity achieved sensitivity 80.0% [33.3-100.0%], specificity 87.4% [84.8-89.9%], positive predictive value 4.8% [1.1-9.9%], and negative predictive value 99.8% [99.4-100.0%], detecting 4 of 5 VAP cases with approximately 80 false alarms (12.6% false positive rate). Brier score was 0.0078. Feature importance identified baseline P:F ratio as the dominant predictor (41.3% total gain), followed by ventilator mode (26.1%), secretion amount (13.2%), secretion description (9.1%), procalcitonin (5.9%), and FiO2; (4.5%). Conclusions ADVISE demonstrates that baseline oxygenation trajectory and ventilatory support patterns - derived exclusively from routinely charted ICCA variables - can identify admissions at risk of VAP-related physiological deterioration with meaningful discrimination (AUROC 0.874) despite severe class imbalance. The 80% sensitivity operating point offers a clinically actionable alert rate (12.6% FPR), supporting integration into existing ICU workflows. This proof-of-concept study establishes feasibility; multi-site prospective validation is required before clinical deployment.

21.
arXiv (CS.CL) 2026-06-11

Models That Know How Evaluations Are Designed Score Safer

The validity of AI safety evaluations depends on models behaving consistently across controlled and deployment settings. Prior work has identified test-time contextual cues, such as hypothetical scenarios, as a source of verbalized evaluation awareness and subsequent behavioral shift. In this paper, we investigate a potential explanation of this phenomenon: evaluation meta-knowledge, defined as parametric knowledge about the structural traits that characterize evaluations. Similar to dataset contamination, where benchmark exposure leads to higher performance through memorization, we hypothesize that models trained on texts describing evaluation practices may implicitly learn to recognize and respond to evaluation-like contexts, for instance, through exposure to scientific articles or social media posts about AI benchmarking. To test this, we fine-tune models on synthetic documents describing evaluation traits such as verifiable structures or moral dilemmas. Evaluating this fine-tuned model on six safety benchmarks, we find that it is significantly safer than the base model and control model. This behavioral shift persists even when restricting the analysis to responses lacking explicit verbalization of evaluation awareness. Our results demonstrate that evaluation meta-knowledge may inflate safety benchmark performance, introducing a novel confounder that is independent of explicit memorization or verbalized evaluation awareness, thus, challenging to detect. These findings have important implications for the design and interpretation of AI safety evaluations. Our code and models are available at https://github.com/compass-group-tue/arxiv2026_evaluation_meta_knowledge.

22.
arXiv (CS.LG) 2026-06-25

Bridging Spherical Black-Box Optimizers

arXiv:2606.25761v1 Announce Type: new Abstract: When gradient information is unavailable, black-box optimization (BBO) methods provide a practical alternative. While Evolution Strategies (ES), Consensus-Based Optimization (CBO), Optimization via Integration (OVI), and related methods have each been studied independently, their connections remain underexplored. We unify these approaches within a common theoretical framework, revealing that they differ primarily in two design choices: fitness aggregation (controlling sharpness preference) and consensus scope (controlling modality). Leveraging these insights, we introduce hybrid optimizers that interpolate between existing methods. Our ES-OVI hybrid allows explicit control over the preference for flat minima, enabling a trade-off between performance and robustness in continuous control tasks. Our CBO-OVI hybrids combine the higher-dimensional efficiency of parametric methods with the multimodal capabilities of particle-based approaches, achieving competitive results on language model merging under limited evaluation budgets. We validate our methods on standard BBO benchmarks and higher-dimensional locomotion tasks, demonstrating that the hybrid methods can outperform their constituent algorithms.

23.
arXiv (CS.CV) 2026-06-11

Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning

Multi-modal large language models (MLLMs) depend on in-context learning (ICL) for rapid task adaptation, but their scalability is severely limited by finite context windows and the growing cost of key-value (KV) caches in long multi-modal sequences. Existing memory compression approaches typically rely on rigid token removal or sample-dependent importance estimation, which introduces bias, disrupts semantic structure, particularly for visual representations, and yields static memories that cannot adapt to new queries. We introduce TASM (Task-Aware Structured Memory), a training-free framework that addresses these limitations through task-aware, structure-preserving, and dynamically accessible memory construction. TASM employs task-vector guided compression to replace sample-specific signals with a task-level direction that captures shared relevance across demonstrations. To preserve the underlying manifold, it applies semantics-aware token merging via bipartite graph matching, aggregating tokens without destructive pruning. Finally, TASM structures memory into a hierarchy comprising a compact Core Memory and a Latent Bank, facilitating query-adaptive dynamic retrieval. Evaluations confirm TASM maintains high performance under heavy compression, effectively balancing efficiency with adaptability.

24.
arXiv (CS.CV) 2026-06-16

KeepLoRA++: Continual Learning with Layer-Scaled Residual Gradient Adaptation

Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents KeepLoRA++, balancing these objectives through a unified dual-dimensional knowledge retention mechanism. We analyze knowledge distribution of Transformer architecture from both inter-layer and intra-layer perspectives. The inter-layer perspective examines how retention is distributed across layers, while the intra-layer perspective focuses on the parameter space within each layer. Our analysis reveals a structural property: general transferable knowledge is mainly encoded in the shallow layers and the principal subspace of the parameters, while task-specific adaptations are localized in the deep layers and the residual subspace. Motivated by this insight, KeepLoRA++ introduces a layer-scaled residual gradient adaptation method. New tasks are learned by restricting LoRA parameter updates to the residual subspace, combined with a shallow-to-deep layer scaling, to prevent interference with previously acquired capabilities. Specifically, the gradient of a new task is projected onto a subspace orthogonal to both the principal subspace of the pre-trained model and the dominant directions of previous task features, while simultaneously assigning smaller update magnitudes to shallow layers and larger ones to deeper layers. Our theoretical analysis and empirical evaluations confirm that KeepLoRA++ successfully balances these three competing objectives, consistently outperforming representative baselines across image classification, visual question answering, and video understanding tasks.

25.
arXiv (quant-ph) 2026-06-19

Anomalous magneto-optical response at $\mathrm{RuO_2 / WSe_2}$ van der Waals interface

arXiv:2606.20262v1 Announce Type: cross Abstract: Ruthenium dioxide ($\mathrm{RuO_2}$) has been proposed as an altermagnetic candidate, although its magnetic ground state remains controversial. Here, we probe weak interfacial magnetic states at the surface of (001)-oriented $\mathrm{RuO_2}$ films using the magnetic proximity effect (MPE) in a van der Waals heterostructure consisting of monolayer tungsten diselenide ($\mathrm{WSe_2}$) atop $\mathrm{RuO_2}$. Temperature-dependent magneto-optical spectroscopy reveals an anomalous excitonic energy shift and a deviation from conventional Varshni behavior below 55 K that are absent in an encapsulated $\mathrm{WSe_2}$ control sample. The anomalous shift reverses sign upon field cooling with opposite magnetic field polarity, indicating a magnetic origin. Polarization-resolved measurements further show a nearly field-independent and fluctuating valley splitting in $\mathrm{WSe_2 / RuO_2}$ in strong contrast to the conventional linear Zeeman splitting observed in the control bare $\mathrm{WSe_2}$ sample. These results suggest that the valley states are governed predominantly by interfacial exchange fields associated with weak surface magnetic states in $\mathrm{RuO_2}$, which do not produce a conventional linear Zeeman response within the applied magnetic field range. Importantly, this approach enables direct optical probing of emergent surface magnetism without introducing an additional ferromagnetic layer, positioning MPE-based optical probing as a tool for investigating weak surface magnetism and offering new possibilities for studying magnetic materials with controversial magnetic states.