Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-16

DifferAD-R1: A Difference-Guided IndustrialAnomaly Localization with Multimodal LargeLanguage Models

Industrial anomaly localization aims to accurately identify and localize abnormal regions in industrial products, addressing the critical challenge of detecting unseen defect categories in real-world scenarios. Traditional closed-set methods often suffer from poor cross-scenario generalization, while existingMultimodal Large Language Model (MLLM)-based approachesface two core limitations: they either adopt QA-style paradigmsmisaligned with the practical demands of localization, or relyon standard optimization techniques such as Group RelativePolicy Optimization (GRPO), which fails to deliver effectivelearning signals for subtle defects. To tackle these issues, thispaper proposes DifferAD-R1, an MLLM-augmented reinforcement learning framework tailored for industrial anomaly localization. We design a Difference-Guided dual-image paradigm,which reformulates the localization task as a one-shot difference grounding problem to effectively explore cross-scenarioanomalies. A Dual-Consistency Localization Reward is developedfor hard-to-detect anomalies, enhancing optimization stabilityand robustness. Additionally, we integrate a difficulty-awarestrategy with adaptive reweighting and group-wise resamplingto prioritize learning on challenging instances. To facilitateevaluations in real-world industrial settings, we construct theAD-DualDiff dataset, comprising 13K paired images across 20categories. Experimental results demonstrate that DifferADR1 significantly outperforms existing baselines and achievescompetitive performance compared to large-scale models likeQwen3-VL (235B parameters). Our code is publicly availableat: https://github.com/Rong2026/work-1.

02.
arXiv (CS.LG) 2026-06-11

Renewable Lasso without Batch-Number Constraints: A Gradient-Enhanced Approach

arXiv:2606.11738v1 Announce Type: cross Abstract: We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only historical summaries, which modifies and improves upon the existing renewable estimation approach for the same model in the high-dimensional setting, and removes the batch-number constraint in previous studies. We then extend the method to distributed streaming data under the master-client architecture, where batches are partitioned across sites and only summaries (gradient vectors) are exchanged. Instead of directing applying the popular method of Jordan et al. (2019) to the surrogate quadratic loss, our adjusted approach does not require the clients to compute the full surrogate loss. We derive non-asymptotic error bounds under the high-dimensional scaling, without the stringent constraint on the number of batches in the previous studies. Simulation results under linear and logistic models, together with a real-data application, show improved accuracy over existing renewable estimators.

03.
arXiv (CS.AI) 2026-06-19

Secure Coding Drift in LLM-Assisted Post-Quantum Cryptography Development: A Gamified Fix

arXiv:2606.19474v1 Announce Type: cross Abstract: The transition to Post Quantum Cryptography (PQC) introduces considerable implementation complexity, requiring strict adherence to constant-time execution, side channel resistance, and precise parametrisation. Simultaneously, large language models (LLMs) are heavily embedded in software development workflows, including cryptographic engineering. While LLMs improve productivity, evidence shows that they frequently generate insecure or suboptimal code, particularly in security critical domains. This paper introduces Secure Coding Drift in PQC, a novel socio technical vulnerability model capturing the gradual degradation of secure coding practices due to sustained reliance on LLM-generated code. Unlike prior work that focuses on static vulnerabilities, we conceptualise security risk as a longitudinal behavioural phenomenon rising from human AI interaction. To mitigate this, we propose a gamified, LLM augmented secure coding framework that embeds adversarial evaluation, behavioural feedback, and security scoring into development workflows. Our approach reframes LLMs from passive assistants into active security co-pilots, contributing toward safer PQC implementation in AI mediated environments.

04.
arXiv (CS.CL) 2026-06-12

PiDA: Phonetically-Informed Data Augmentation for Robust Vietnamese Speech Translation

Cascaded speech translation (ST) systems suffer from error propagation when Automatic Speech Recognition (ASR) outputs incorrect transcripts. We present the first systematic categorization of ASR errors for Vietnamese ST, classifying substitution errors by phonetic cause and quantifying their impact on downstream Neural Machine Translation (NMT) performance using Linear Mixed-Effects Modelling. We confirm that most ASR substitution errors arise from phonetic confusions rather than random noise, and that these phonetic errors significantly degrade ST quality. Motivated by this finding, we propose Phonetically-Informed Data Augmentation (PiDA), which generates ASR-like corruptions by substituting words with phonetically similar alternatives using phonetic word embeddings. Fine-tuning on a PiDA-augmented version of FLEURS Vietnamese-English improves translation of erroneous ASR outputs (up to +2.04 BLEU over standard fine-tuning) while also slightly improving clean-text performance.

05.
arXiv (CS.CV) 2026-06-15

HiST: A Hierarchical Sparse Transformer for Cross-Modal Spatial Transcriptomics Modeling

Spatial transcriptomics (ST) links gene expression with tissue morphology but remains expensive and low-throughput, motivating surrogates that infer expression from routine histology. Whole-slide H&E-to-ST inference pairs a gigapixel image with gene measurements at a sparse, irregular set of locations, making multiscale modeling challenging without incurring dense-grid overhead or quadratic token mixing. We propose HiST, a hierarchical sparse transformer that treats measured locations as a lattice-indexed sparse field and builds a dyadic encoder–decoder directly on the active tissue footprint. HiST combines sparse window attention for local geometric correspondence with resolution-changing operators for rapid multiscale context integration. For a fixed window size, the dominant runtime and memory scale with the number of observed locations rather than the dense slide area. To mitigate slide-specific acquisition variation, HiST adds a bottlenecked global conditioning pathway via a slide calibration token that summarizes slide-level context and conditions local representations. On a multi-organ benchmark spanning diverse tissues and acquisition sources, HiST improves predictive performance over recent baselines while reducing runtime and peak memory.

06.
arXiv (CS.AI) 2026-06-19

FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

arXiv:2604.11556v2 Announce Type: replace-cross Abstract: LLM-assisted software development has become increasingly prevalent, and can generate large-scale systems, such as compilers. It becomes crucial to strengthen the correctness of the generated code. However, automated reasoning for large-scale systems remains challenging due to code complexity. Hoare logic offers an approach to decomposing a large system into smaller components and reasoning about them separately (i.e., compositional reasoning). However, existing works still struggle to scale, because Hoare logic requires writing formal specifications for each function, imposing a heavy human burden. The problem is exacerbated when code is generated by LLMs, as developers lack a deep understanding of each function's expected behavior. This paper presents FM-Agent, the first framework that realizes automated compositional reasoning for large-scale systems. Leveraging LLMs, FM-Agent introduces a top-down paradigm to automatically generate function-level specifications. Specifically, FM-Agent derives the specification of a function from how its callers expect the function to behave, so the generated specifications can reflect the developer's intent of a function even if the implementation is buggy. Developers' intent is usually expressed in natural language, while existing verifiers only support formulas. Therefore, FM-Agent generalizes Hoare-style inference to reason about functions against natural-language specifications. Finally, to confirm bug existence and explain bug causes, FM-Agent automatically generates test cases to trigger potential bugs. In our evaluation, FM-Agent successfully reasons about large-scale systems within 2 days, each of which has up to 143k LoC. These systems have already been tested by their developers, but FM-Agent still finds 522 newly discovered bugs. These bugs can cause serious consequences, including system crashes and incorrect execution results.

07.
arXiv (quant-ph) 2026-06-16

Microscopic exceptional points in the post-selected open Jaynes–Cummings model

arXiv:2606.14982v1 Announce Type: new Abstract: Phenomenological non-Hermitian Hamiltonians track selected signatures of complex reservoir dynamics, while post-selected no-jump effective Hamiltonians derived from microscopic open-system theory reveal the underlying system–reservoir physics. We derive such a Hamiltonian for the open Jaynes–Cummings model using a Moore–Penrose normalized $\mathrm{su}(2)$ representation that removes the vacuum-sector singularity and diagonalizes the full Hamiltonian by one operator rotation. Starting from a zero-temperature bosonic reservoir, we obtain a Gorini–Kossakowski–Sudarshan–Lindblad master equation under the Born–Markov approximation with full Bohr-frequency resolution. We use partial Bohr-frequency resolution to build a consistent post-selected no-jump Hamiltonian near exceptional points, where decay rates become comparable to Rabi frequencies and remove the scale separation behind full resolution. The normalized $\mathrm{su}(2)$ form of the resulting non-Hermitian Jaynes–Cummings Hamiltonian reveals the effects of Lamb-shifted detuning, diagonal loss imbalance, and reservoir-modified coupling. Our microscopic exceptional-point analysis recovers the experimentally reported single-excitation exceptional point for unequal independent losses and identifies regimes absent from the standard phenomenological model; for example, equal correlated losses with orthogonal channel phase produce a second-order exceptional point at the same loss-to-coupling ratio in every excitation sector.

08.
arXiv (quant-ph) 2026-06-12

Path integral control of open quantum systems

arXiv:2410.18635v4 Announce Type: replace Abstract: We investigate open-loop quantum state preparation for a class of open quantum systems whose dynamics follow a Gorini-Kossakowski-Lindblad-Sudarshan (GKLS) master equation that admits a trajectory-based stochastic representation. The deterministic control objective is reformulated as a stochastic optimal control problem – interpreting stochasticity as a methodological tool akin to stochastic Schrödinger equation unravelings – which situates the problem within the path integral control framework. For the class of GKLS generators under consideration, this reformulation leads to an explicit expression for the optimal control as a weighted average over stochastic quantum trajectories, thereby eliminating the need for gradient evaluations. Building on this theoretical result, we derive a control update rule for piecewise-constant control pulses and demonstrate that adaptive importance sampling progressively enhances the control estimator during optimization, culminating in the algorithm we term Path integral Quantum Control (PiQC). We further introduce an annealed variant of PiQC, wherein a synthetic noise schedule gradually steers open-system trajectories toward closed-system dynamics, enabling high-fidelity unitary state preparation. Numerical studies on a dissipative single-qubit system and a multi-qubit Nuclear Magnetic Resonance model verify that PiQC yields precise open-loop controls and displays robustness to Hamiltonian perturbations. We propose PiQC as a trajectory-based alternative to gradient-based approaches, which might offer a viable solution in quantum control problems where gradient computation is infeasible or computationally demanding.

09.
arXiv (CS.CL) 2026-06-24

ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement

Despite the remarkable performance of large language models (LLMs) in text-to-SQL (SQL generation), correctly producing SQL queries remains challenging during initial generation. The SQL refinement task is subsequently introduced to correct syntactic and semantic errors in generated SQL queries. However, existing paradigms face two major limitations: (i) self-debugging becomes increasingly ineffective as modern LLMs rarely produce explicit execution errors that can trigger debugging signals; (ii) self-correction exhibits low detection precision due to the lack of explicit error modeling grounded in the question and schema, and suffers from severe hallucination that frequently corrupts correct SQLs. In this paper, we propose ErrorLLM, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement. Specifically, we represent the user question and database schema as structural features, employ static detection to identify execution failures and surface mismatches, and extend ErrorLLM's semantic space with dedicated error tokens that capture categorized implicit semantic error types. Through a well-designed training strategy, we explicitly model these errors with structural representations, enabling the LLM to detect complex implicit errors by predicting dedicated error tokens. Guided by the detected errors, we perform error-guided refinement on the SQL structure by prompting LLMs. Extensive experiments demonstrate that ErrorLLM achieves the most significant improvements over backbone initial generation. Further analysis reveals that detection quality directly determines refinement effectiveness, and ErrorLLM addresses both sides by high detection F1 score while maintain refinement effectiveness.

11.
arXiv (quant-ph) 2026-06-11

Sharing quantum indistinguishability with multiple parties

arXiv:2512.15199v3 Announce Type: replace Abstract: Quantum indistinguishability of non-orthogonal quantum states is a valuable resource in quantum information applications such as cryptography and randomness generation. In this article, we present a sequential state-discrimination scheme that enables multiple parties to share quantum uncertainty, in terms of the max relative entropy, generated by a single party. Our scheme is based upon maximum-confidence measurements and takes advantages of weak measurements to allow a number of parties to perform state discrimination on a single quantum system. We review known sequential state discrimination and show how our scheme would work through a number of examples where ensembles may or may not contain symmetries. Our results will have a role to play in understanding the ultimate limits of sequential information extraction and guide the development of quantum resource sharing in sequential settings.

12.
PLOS Medicine 2026-06-04

Comparative impacts and cost-effectiveness of tuberculosis systematic screening strategies in prisons in Brazil, Colombia, and Peru: A mathematical modeling study

Authors:

by Yiran E. Liu, José Victor Bortolotto Bampi, Ronan F. Arthur, Argita D. Salindri, Caroline Busatto, Pedro Avedillo Jiménez, Daniele Maria Pelissari, Fernanda Dockhorn Costa Johansen, Robert Arana-Narvaez, Alvaro Fernando Moreno Roca, Wilfredo Santos Solís Tupes, Esther Mori Jiu, Christian Alfredo Moreno Roca, Erika Albertina Abregú Contreras, Valentina Antonieta Alarcón Guizado, Julián Trujillo Trujillo, Belkys Marcelino, Mónica Alonso Gonzalez, Mayra Cecilia Córdova Ayllon, Ted Cohen, Moises A. Huaman, Jeremy D. Goldhaber-Fiebert, Julio Croda, Jason R. Andrews Background Incarceration is a leading driver of tuberculosis in Latin America. Systematic screening in prisons may reduce tuberculosis burden, but optimal strategies and cost-effectiveness remain uncertain. We examined the population-wide health impacts and cost-effectiveness of systematic screening in prisons in Brazil, Colombia, and Peru, comparing different timepoints, frequencies, and screening algorithms. Methods and findings Using dynamic transmission models calibrated to Brazil, Colombia, and Peru, we simulated annual or biannual (twice-yearly) prison-wide screening, alone or combined with entry and exit screening from 2026 to 2035. We evaluated four algorithms: (1) symptom screening, (2) chest X-ray with computer-aided detection (CXR-CAD), (3) symptoms and CXR-CAD (follow-up testing if either is positive), and (4) GeneXpert Ultra (Xpert) with pooled sputum. Individuals screening positive then received individual Xpert. We projected impacts on within-prison and population-level tuberculosis incidence in 2035, along with discounted costs (2023 US dollars) and disability-adjusted life years (DALYs). Model projections showed that combined entry, exit, and biannual screening with CXR-CAD was highly impactful and cost-effective across countries, reducing tuberculosis incidence by 61%–87% in prisons and 18%–28% population-wide. Compared to only biannual CXR-CAD (the next best strategy), the incremental cost per DALY averted of adding entry and exit screening was $2,984 (Brazil), $2,925 (Colombia), and $645 (Peru). Adding symptom screening to CXR-CAD marginally increased benefit and was only cost-effective in Peru’s higher-incidence prisons. Biannual screening alone remained cost-effective at prison incidence levels well below national averages, as well as at far lower willingness-to-pay thresholds. In settings without CXR-CAD, pooled Xpert was an impactful, cost-effective alternative. Key limitations include the model’s simplified representation of tuberculosis disease states and lack of stratification by age, gender/sex, HIV, or drug resistance. Conclusions These modeling results support immediate national-level adoption of prison-wide tuberculosis screening twice-yearly and at entry and exit, using CXR-CAD or pooled Xpert.

13.
arXiv (CS.AI) 2026-06-16

ArtNet: A JEPA-Like Articulatory Predictive Framework for Robust Zero-Shot Phoneme Recognition

arXiv:2606.16595v1 Announce Type: cross Abstract: Zero-shot cross-lingual phoneme recognition is often hindered by the fragility of direct acoustic-to-symbol mapping, which is susceptible to language-specific variations. Echoing joint-embedding predictive architecture (JEPA) work in vision, we propose ArtNet, a framework that explores a structured feature prediction task based on articulatory features to enhance acoustic robustness. Specifically, ArtNet integrates an articulatory predictor, designed to extract universal articulatory representations from self-supervised learning (SSL) features, with a variational information bottleneck (VIB) to suppress language-specific variations. Experiments on seven unseen languages demonstrate that ArtNet, particularly when synergized with the proposed vector-space inventory alignment (VSIA) strategy, significantly outperforms competitive baselines, achieving a 20.56\% relative reduction in phoneme error rate (PER) and 7.01\% in phoneme feature error rate (PFER).

14.
arXiv (CS.CL) 2026-06-16

Beyond English: Uncovering the Multilingual Gap in Vision-Language-Action Models

Vision-Language-Action models have recently demonstrated promising capabilities in learning generalist robot policies from large-scale multimodal data. However, most existing VLA systems are trained and evaluated primarily with English instructions, leaving their ability to understand and execute instructions in other languages largely unexplored. While the underlying large language models often possess multilingual capabilities, it remains unclear whether these multilingual capabilities transfer to VLAs during training. In this work, we present the first systematic study of multilingual instruction following in VLA models. We first construct multilingual instructions by extending existing benchmarks with translations of their instructions. Using these instructions, we evaluate several representative VLA models across a range of tasks in simulation settings. Our experiments reveal a significant multilingual gap: models trained primarily on English instructions exhibit substantial performance degradation when evaluated on other languages, even when the underlying language backbone is multilingual. We provide several findings and analyses to understand the multilingual gap. Cross-lingual transfer behavior analysis shows that performance drops correlate with both instruction understanding and action execution. Representation analyses suggest that multilingual instruction-caused representation shifts may contribute to the multilingual gap. Motivated by these findings, we further explore strategies to improve multilingual performance in VLAs. We propose a simple yet effective multilingual fine-tuning approach, Multilingual Principal Component Alignment, which leverages Principal Component Analysis to get the principal component subspace and align projected multilingual representations, effectively reducing the multilingual performance gap.

15.
arXiv (CS.AI) 2026-06-24

Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings

arXiv:2505.13087v2 Announce Type: replace-cross Abstract: We propose a novel benchmarking methodology for graph neural networks (GNNs) based on the graph alignment problem, a combinatorial optimization task that generalizes graph isomorphism by aligning two unlabeled graphs to maximize overlapping edges. We frame this problem as a self-supervised learning task and present several methods to generate graph alignment datasets using synthetic random graphs and real-world graph datasets from multiple domains. For a given graph dataset, we generate a family of graph alignment datasets with increasing difficulty, allowing us to rank the performance of various architectures. Our experiments prove that there is an optimal task difficulty for having a statistically relevant ranking of different models and that, even on a structure-only task, anisotropic models perform better compared to isotropic ones. To further prove that our synthetic task capture meaningful information, we show its effectiveness for self-supervised GNN pre-training: the learned node embeddings can be leveraged as positional encodings by transformers for graph regression or can be used to reconstruct the full structure of the graph with $98\%$ accuracy. To support reproducibility and further research, we provide an open-source Python package to generate graph alignment datasets and benchmark new GNN architectures. The source code is available at https://github.com/adrien-lagesse/graph-alignment-benchmark.

16.
arXiv (CS.LG) 2026-06-11

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

arXiv:2506.08473v4 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment direction - defined by weight differences between aligned (safe) and unaligned models - rapidly compromise model safety. In contrast, updates along the alignment direction largely preserve it, revealing the parameter space as a "narrow safety basin". To address this, we propose AsFT (Anchoring Safety in Fine-Tuning) to maintain safety by explicitly constraining update directions during fine-tuning. By penalizing updates orthogonal to the alignment direction, AsFT effectively constrains the model within the "narrow safety basin," thus preserving its inherent safety. Extensive experiments on multiple datasets and models show that AsFT reduces harmful behaviors by up to 7.60%, improves task performance by 3.44%, and consistently outperforms existing methods across multiple tasks.

17.
arXiv (CS.CV) 2026-06-15

FBSDiff++: Improved Frequency Band Substitution of Diffusion Features for Efficient and Highly Controllable Text-Driven Image-to-Image Translation

With large-scale text-to-image (T2I) diffusion models achieving significant advancements in open-domain image creation, increasing attention has been focused on their natural extension to the realm of text-driven image-to-image (I2I) translation, where a source image acts as visual guidance to the generated image in addition to the textual guidance provided by the text prompt. We propose FBSDiff, a novel framework adapting off-the-shelf T2I diffusion model into the I2I paradigm from a fresh frequency-domain perspective. Through dynamic frequency band substitution of diffusion features, FBSDiff realizes versatile and highly controllable text-driven I2I in a plug-and-play manner (without need for model training, fine-tuning, or online optimization), allowing appearance-guided, layout-guided, and contour-guided I2I translation by progressively substituting low-frequency band, mid-frequency band, and high-frequency band of latent diffusion features, respectively. In addition, FBSDiff flexibly enables continuous control over I2I correlation intensity simply by tuning the bandwidth of the substituted frequency band. To further promote image translation efficiency, flexibility, and functionality, we propose FBSDiff++ which improves upon FBSDiff mainly in three aspects: (1) accelerate inference speed by a large margin (8.9$\times$ speedup in inference) with refined model architecture; (2) improve the Frequency Band Substitution module to allow for input source images of arbitrary resolution and aspect ratio; (3) extend model functionality to enable localized image manipulation and style-specific content creation with only subtle adjustments to the core method. Extensive qualitative and quantitative experiments verify superiority of FBSDiff++ in I2I translation visual quality, efficiency, versatility, and controllability compared to related advanced approaches.

18.
arXiv (CS.LG) 2026-06-18

MetaboNet-Bench: A Multi-modal Benchmark for Glucose Forecasting in Type 1 Diabetes

arXiv:2606.18640v1 Announce Type: new Abstract: Glucose forecasting algorithms are an important aspect of glycemic control management in type 1 diabetes. So far, the research community has developed numerous algorithms and models for forecasting. However, it is well-recognized that the lack of standardized model performance evaluation benchmarks makes fair comparison difficult and hinders further innovation, and thus benchmark standardization is in urgent need. Furthermore, many published glucose forecasting algorithms are limited to CGM data alone, ignoring other multimodal signals such as insulin dosing and carbohydrate intake. Here, we introduce MetaboNet-Bench, a benchmark for multimodal glucose forecasting for patients with type 1 diabetes that provides an extensible open-source evaluation framework for comparison of glucose forecasting algorithms that leverage glucose, insulin, and carbohydrate data. We then demonstrate its utility by benchmarking several recently published glucose forecasting models and a custom multimodal time-series model, representing different model architectures. The results show that the benefit of adding data modalities is conditioned on the complexity of the model and that incorporating more clinical metrics helps identify meaningful gaps to fill for future research.

19.
arXiv (CS.CV) 2026-06-16

Efficient Flow Matching using Latent Variables

Flow matching models have shown great potential in image generation tasks among probabilistic generative models. However, most flow matching models in the literature do not explicitly utilize the underlying clustering structure in the target data when learning the flow from a simple source distribution like the standard Gaussian. This leads to inefficient learning, especially for many high-dimensional real-world datasets, which often reside in a low-dimensional manifold. To this end, we present $\texttt{Latent-CFM}$, which provides efficient training strategies by conditioning on the features extracted from data using pretrained deep latent variable models. Through experiments on synthetic data from multi-modal distributions and widely used image benchmark datasets, we show that $\texttt{Latent-CFM}$ exhibits improved generation quality with significantly less training and computation than state-of-the-art flow matching models by adopting pretrained lightweight latent variable models. Beyond natural images, we consider generative modeling of spatial fields stemming from physical processes. Using a 2d Darcy flow dataset, we demonstrate that our approach generates more physically accurate samples than competing approaches. In addition, through latent space analysis, we demonstrate that our approach can be used for conditional image generation conditioned on latent features, which adds interpretability to the generation process.

20.
arXiv (CS.LG) 2026-06-16

Anomaly Detection via Mean Shift Density Enhancement

arXiv:2602.03293v2 Announce Type: replace Abstract: Unsupervised anomaly detection stands as an important problem in machine learning. Existing unsupervised anomaly detection algorithms rarely perform well across different anomaly types, often excelling only under specific structural assumptions. This lack of robustness also becomes particularly evident under noisy settings. We propose Mean Shift Density Enhancement (MSDE), a fully unsupervised framework that detects anomalies through their geometric response to density-driven manifold evolution. MSDE is designed as a general purpose anomaly detection framework, based on the principle that normal samples, being well supported by local density, remain stable under iterative density enhancement, whereas anomalous samples undergo large cumulative displacements as they are attracted toward nearby density modes. To operationalize this idea, MSDE employs a weighted mean-shift procedure with adaptive, sample-specific density weights derived from a manifold learning-based fuzzy neighborhood graph. We evaluate MSDE on an anomaly detection benchmark comprising 46 real-world tabular datasets, four realistic anomaly generation mechanisms, and six noise levels. Compared to 13 established unsupervised baselines, MSDE achieves consistently strong, balanced and robust performance for several standard classification metrics, at several noise levels and on average over several types of anomalies. These results demonstrate that displacement-based scoring provides a robust alternative to the existing state-of-the-art for unsupervised anomaly detection.

21.
arXiv (CS.AI) 2026-06-16

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

arXiv:2603.10384v3 Announce Type: replace Abstract: Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically grounded geometric kinematics. By decomposing reasoning traces into Progress (displacement) and Stability (curvature), we reveal a distinct topological divergence: correct reasoning manifests as high-progress, stable trajectories, whereas hallucinations are characterized by low-progress, unstable patterns (stalled displacement with high curvature fluctuations). Leveraging these signatures, our probabilistic framework achieves competitive performance and superior robustness across diverse benchmarks. Crucially, TRACED bridges geometry and cognition by mapping high curvature to ''Hesitation Loops'' and displacement to ''Certainty Accumulation'', offering a physical lens to decode the internal dynamics of machine thought.

22.
arXiv (CS.CV) 2026-06-16

FUSE: Quantifying Uncertainty in Vision-Language Models by Bayesian Fusing Epistemic and Aleatoric Uncertainty

Vision-language models (VLMs) are playing an increasingly important role across multiple domains. In many applications, such as robotics, it is crucial to quantify the uncertainty in the output of these models. } We develop FUSE, a probabilistic framework for capturing two complementary sources of uncertainty in vision-language modeling: (i) aleatoric embedding-level uncertainty derived from input data vision-language ambiguity, and (ii) epistemic model-level uncertainty estimated from the semantic response diversity of VLMs. Our approach formulates a Bayesian fusion mechanism that analytically combines these uncertainty sources to produce a scalar measure of uncertainty. This measure can be used to reliably predict the model's output correctness for downstream applications. We demonstrate that our method outperforms baselines and achieves SOTA uncertainty calibration.

23.
arXiv (quant-ph) 2026-06-24

Fractional squeezing: spectra and dynamics from generalized squeezing Hamiltonian with fractional orders

Authors:

arXiv:2601.15693v2 Announce Type: replace Abstract: We generalize the generalized-squeezing problem to include fractional values of the squeezing order $n$. This approach allows us to determine the locations of critical points at which qualitative changes in behaviour occur and accurately predict the behaviour at these critical points, which are challenging for conventional computational methods. Based on our numerical calculations, we identify with a high degree of confidence the point at which the spectrum turns from continuous to discrete and the point at which oscillations turn from having asymptotically infinite amplitudes to having finite amplitudes. Furthermore, we numerically investigate the behaviour in the large $n$ regime and provide an intuitive explanation for the numerical results.

24.
medRxiv (Medicine) 2026-06-16

The biological clock of multimorbidity: temporal dynamics of disease co-occurrence in primary care

Multimorbidity is the dominant clinical reality of primary care, yet the temporal dynamics governing when and how persistent comorbidity associations emerge remain poorly characterised. Most large-scale comorbidity studies adopt a single observation window after an index diagnosis, implicitly assuming that associations detectable at one year are equally detectable at five. Using 11 years of electronic health records from 5,821,197 individuals in Catalan primary care, we applied a matched cohort design across nine complementary follow-up windows, five cumulative (0-1 to 0-5 years) and four conditional (1-2 to 4-5 years), to 1,315 index diseases, identifying 144,030 significant directed comorbidity associations in the five-year network. We found that 60.1% of these associations required at least three years of follow-up and were undetectable in shorter-window analyses, demonstrating that observation window length is a primary determinant of which comorbidities can be observed. To organise this temporal heterogeneity, we introduce the biological clock of multimorbidity: a two-dimensional framework that positions ICD-10 disease categories according to their rates of cumulative signal attenuation and the persistence of conditional risk. This framework identifies four reproducible temporal patterns (episodic, chronic stable, chronic progressive, and transient-persistent) that are robust under bootstrap resampling, leave-one-disease-out sensitivity analysis, and alternative clustering approaches. The biological clock is systematically modulated by sex, with Blood/Immune and Musculoskeletal disorders showing the largest sex differences in temporal dynamics. Network analysis identified 19 disease "initiators" that generate broad downstream comorbidity burdens and 21 "sinks" representing convergent endpoints of multiple disease trajectories. Comparison with hospital-based Danish data from 6,909,676 individuals showed that shared associations were 2.7-fold enriched over chance expectation (hypergeometric test, p

25.
arXiv (CS.AI) 2026-06-24

Evaluating the Interpretability of Sparse Autoencoders with Concept Annotations

arXiv:2606.24716v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are increasingly used to extract interpretable concepts from vision and vision language models, yet existing evaluation methods largely rely on proxy metrics or qualitative inspection rather than measuring semantic correspondence. We present a human-grounded evaluation framework that quantifies alignment between SAE latents and human-annotated concepts, without requiring user studies, and validate this matching through targeted attribute perturbations. To enable this intervention-style evaluation in vision, we construct synCUB and synCOCO, synthetic benchmarks of paired images that differ in exactly one attribute. We introduce Fully-Binary Matching Pursuit (FBMP), a coalition-based matching procedure that supports many-to-one mappings between SAE latents and annotated concepts, and consistently outperforms one-to-one baselines. For functional validation, we propose a Targeted Attribute Perturbation Alignment Score (TAPAScore), which tests whether matched concepts respond selectively and in the expected direction under targeted image-level attribute perturbations. Under sanity checks, our matching and TAPAScore are the only evaluated metrics that reliably distinguish trained SAEs from untrained ones. Across SAEs trained on CLIP and DINOv2 embeddings, we find that increased overcompleteness can reduce perturbation alignment, indicating a reduction in interpretability. Our evaluation framework suggests that moderate dictionary sizes provide the best trade-off, yielding the most interpretable SAEs. Code and datasets are available at https://github.com/JonasKlotz/sae-concept-eval.