Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-12

Symmetry and Topology of Monitored Quantum Dynamics

arXiv:2412.06133v4 Announce Type: replace-cross Abstract: The interplay between unitary dynamics and quantum measurements induces diverse phenomena in open quantum systems with no counterparts in closed quantum systems at equilibrium. Here, we generally classify Kraus operators and their effective non-Hermitian dynamical generators, thereby establishing the tenfold classification for symmetry and topology of monitored free fermions. Our classification elucidates the role of topology in measurement-induced phase transitions and identifies potential topological terms in the corresponding nonlinear sigma models. Furthermore, we establish the bulk-boundary correspondence in monitored quantum dynamics: nontrivial topology in spacetime manifests itself as topologically nontrivial steady states and gapless boundary states in Lyapunov spectra, such as Lyapunov zero modes and chiral edge modes, leading to the topologically protected slowdown of dynamical purification.

02.
arXiv (quant-ph) 2026-06-15

Emission of time-ordered photon pairs from a coherently-driven Kerr microcavity

arXiv:2601.06468v2 Announce Type: replace-cross Abstract: Weakly-interacting many-body systems possess remarkable quantum properties that are essential components of quantum technologies, and constitute a topic of fundamental interest. Here we show that in a solid-state nonlinear microcavity embedding discrete modes of exciton-dressed photons, we can isolate a single eigenmode of quantum fluctuations from the much brighter coherent fraction of the field. In this regime, we perform frequency- and time-resolved correlations measurements between photons on the red and blue side of the fluctuations spectrum. When the average number of fluctuation quanta is smaller than one, we observe the formation of large pairwise time-ordered correlations: red photon first and blue photon second. We show that this peculiar time-ordering correlation emerges spontaneously from the interplay between frequency-resolved detection, and the non-trivial internal quantum structure of the elementary fluctuations.

03.
bioRxiv (Bioinfo) 2026-06-20

SAbDab2: The structural antibody database in the age of machine learning

The Structural Antibody Database (SAbDab) is a publicly available repository of experimentally determined antibody structures, first released in 2013. Explicit support for single-domain antibodies was added in 2021, with SAbDab-nano. Recently, increasing interest in antibodies has led to a proliferation of novel antibody formats, while simultaneous advances in machine learning have increased demand for standardised, high-quality structure data. Here, we present SAbDab2, re-engineered for the machine-learning age. It introduces support for a variety of new formats, and makes it easy to retrieve and compare all known structures of a given antibody. In addition, SAbDab2 provides ready access to ML-grade structures of antibody and antibody–antigen-complexes, with standardised, versioned train/test splits. These will be updated every six months going forward, and are available at https://zenodo.org/records/20083995. SAbDab2 itself is updated weekly and is freely available at https://sabdab2.opig.stats.ox.ac.uk.

04.
arXiv (CS.CV) 2026-06-18

LandslideAgent with Multimodal LandslideBench: A Domain-Rule-Augmented Agent for Autonomous Landslide Identification and Analysis

Intelligent landslide hazard interpretation is critical for disaster prevention, yet current paradigms struggle to simultaneously extract visual features and high-level geoscientific semantics, while general-purpose vision-language models (VLMs) suffer from perceptual limitations and domain hallucinations in complex geological scenarios. To address these challenges, we propose an instruction-driven agentic framework comprising three components. First, LandslideBench, a multimodal fine-grained dataset with seven subtype labels, high-resolution imagery, pixel-level masks, and high-quality textual descriptions, is constructed via multi-VLM cross-validation and interactive annotation. Then, LandslideVLM, a landslide-oriented VLM, is fine-tuned via LoRA on LandslideBench to enhance geological semantic understanding. Finally, LandslideAgent, a domain rule-enhanced agent taking LandslideVLM as its cognitive backbone, employs a dual-rule controller incorporating structured report metadata constraints and cross-validation identification constraints to regulate automated tool invocation. Experiments demonstrate that LandslideBench provides effective baselines across five mainstream models on fine-grained classification and semantic segmentation. LandslideVLM achieves accuracy improvements of 10.96%, 32.87%, and 15.91% on landslide discrimination, fine-grained classification, and semantic description quality, respectively. LandslideAgent further enables autonomous multi-source spatial data inference, realizing full-process intelligence for landslide identification and analysis.

05.
arXiv (CS.AI) 2026-06-12

PolyFlow: Safe and Efficient Polytope-Constrained Flow Matching with Constraint Embedding and Projection-free Update

arXiv:2606.13400v1 Announce Type: cross Abstract: While flow-based generative models have demonstrated strong performance across a wide range of domains, deploying them in safety-critical physical systems remains challenging due to strict constraint requirements. Existing approaches typically enforce safety through post-hoc corrections, which incur substantial computational overhead and may distort the learned distribution. We propose PolyFlow, a polytope-constrained flow matching framework that embeds constraints directly into the model and flow dynamics. PolyFlow introduces a discrete-time flow formulation and a projection-free architecture, which eliminate the discretization error and guarantee strict satisfaction of arbitrary polyhedral constraints, without the need for expensive iterative solvers. Experimental results show that PolyFlow achieves zero constraint violation while maintaining high distributional fidelity across a range of planning and control tasks. Compared to state-of-the-art constrained generation baselines, PolyFlow significantly reduces inference latency and demonstrates a favorable trade-off between safety, efficiency, and generative quality. Code is available on https://github.com/MJianM/PolyFlow.

06.
arXiv (CS.CV) 2026-06-16

LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination

Vision-Language-Action (VLA) models achieve strong performance on standard manipulation benchmarks, but most evaluations assume that task-relevant objects are fully visible. This assumption often fails in realistic settings, where occlusion makes manipulation partially observable. In this paper, we study scene-induced occlusion as a fundamental challenge for VLA models and introduce LIBERO-Occ, an occlusion-oriented extension of LIBERO. Experiments show that state-of-the-art VLAs suffer substantial performance degradation under occlusion. To address this issue, we propose Viewpoint Imagination (VIM), which generates a complementary view from an occluded primary observation and conditions action prediction on both observed and imagined evidence. VIM improves robustness across task suites, occlusion types, and severity levels without requiring additional cameras at deployment time, suggesting that viewpoint imagination is an promising mechanism for perception completion in partially observable manipulation. Our benchmark and corresponding code are available at: \href{https://github.com/litsh/Libero-Occ}{https://github.com/litsh/Libero-Occ}.

07.
arXiv (CS.AI) 2026-06-11

When Context Returns: Toward Robust Internalization in On-Policy Distillation

arXiv:2606.11627v1 Announce Type: cross Abstract: Recent work has shown that on-policy distillation can internalize privileged context, such as system prompts or task hints, into a student model so that the context is no longer needed at inference time. Although this approach successfully improves the student's no-context performance, we identify an interesting and previously unstudied phenomenon: in many settings, reintroducing the original privileged context to the distilled student actually degrades its performance, even on instances it already solves correctly without context. We term this context-induced degradation and argue that robust internalization demands not only matching the teacher's context-conditioned behavior, but also remaining stable when the context is reintroduced, a property we call context removability. Motivated by this observation, we propose a lightweight consistency regularizer that first anchors the student's no-context output via stop-gradient, then penalizes the context-conditioned output for deviating from it via forward KL divergence. This simple addition requires only one extra forward pass per training step, yet it effectively mitigates context-induced degradation and, in many cases, even improves no-context performance. Across 12 configurations spanning diverse domains and model families, our method improves context-conditioned accuracy in the majority of settings, reduces context-induced harm in 11 out of 12 settings, and effectively eliminates response-length inflation. A mechanistic case study further confirms that context removability is achieved at the representation level, with hidden states remaining nearly identical regardless of whether the context is present.

08.
arXiv (CS.AI) 2026-06-17

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

arXiv:2606.17165v1 Announce Type: cross Abstract: Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes recovers the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs. The framework shows that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. When these conditions fail, the effect of interest is only partially identified, and we provide diagnostics that can falsify surrogacy on historical experiments together with a bound on the worst-case bias from limited overlap. We further show that the stochasticity inherent to LLMs introduces both bias and variance, but using an average of multiple draws as the surrogate mitigates both. We illustrate the methods and theory in simulations and an application to A/B tests on Upworthy headlines. A central takeaway from our work is that the validity of LLM outcomes as surrogates can only be falsified for past treatments and never verified for new ones, so human experiments remain indispensable for novel interventions. We discuss the role of LLM choice, prompting, and temperature as design variables, and how to size human experiments for validation.

09.
arXiv (CS.AI) 2026-06-16

Thinking with Visual Grounding

arXiv:2606.16122v1 Announce Type: new Abstract: Visual thinking should not only sound right; it should show its evidence. While recent vision-language models (VLMs) can produce natural-language reasoning traces, these traces often leave the supporting image regions implicit, making them hard to verify and difficult to supervise. We introduce visually grounded thinking, a reasoning process in which models interleave natural-language thoughts with explicit point or box groundings of the visual evidence used at each step. This lets the model express intermediate reasoning in language while grounding key objects in the image regions they refer to. To train this behavior, we construct a scalable synthesis pipeline that distills correct visual reasoning traces, extracts the visual objects required by the traces, grounds them with a SAM3-based agent, and derives aligned point and box supervision from the resulting masks. We further propose grounding-aware reinforcement learning, which combines answer correctness rewards with dense grounding rewards that score whether generated object references match the correct image evidence. Across two counting benchmarks and four spatial reasoning benchmarks, adding visually grounded thinking to Gemma3-4B-IT consistently improves performance over the original model and the non-grounded thinking baseline. On spatial reasoning, the visually grounded thinking 4B models match, and in some cases surpass, Gemma3-27B-IT from the same model family. Our analysis shows that point grounding is well suited to counting, while box grounding benefits most from explicit grounding rewards on spatial tasks. Overall, our results show that VLMs think better when their intermediate thoughts are tied to the image regions that make them true.

10.
arXiv (CS.AI) 2026-06-16

AI-Driven Framework for Adaptive Water Network Management with Proof-of-Concept Implementation: Addressing Non-Revenue Water in Jordan

arXiv:2606.15709v1 Announce Type: new Abstract: Jordan faces severe water scarcity with 50\% of water produced is lost to leakage, theft and metering issues also known as non-revenue water (NRW). Traditional reactive approaches have proven insufficient for sustained NRW reduction. This paper proposes an intelligent framework integrating EPANET hydraulic modeling, digital twin technology, SCADA systems, and large language model (LLM)-based AI agents for continuous network monitoring and adaptive decision-making. The system combines real-time data streams with physics-based simulation to detect anomalies, employing retrieval-augmented generation (RAG) for policy interpretation and function calling for network control. A proof-of-concept implementation validates technical feasibility using EPYT with offline LLMs (llama3.1:8b via Ollama) on a 1,164-junction Amman district network. The system demonstrates automated hydraulic simulation, flow-based anomaly detection aligned with water distribution zone (DZ) practice, and AI-generated health reports with response times under 2 minutes and zero API costs. Burst detection relies on local flow anomaly analysis: a 30.1~L/s simulated leak produces measurable flow redistribution in 15 pipes, flagging a 15-junction cluster that localises the burst – confirming alignment with water distribution zone (DZ) monitoring practice. The framework accommodates Jordan's intermittent supply patterns and limited automation through phased implementation, offering a scalable pathway for water-scarce regions to leverage intelligent automation for NRW reduction and operational efficiency.

11.
arXiv (CS.CV) 2026-06-24

TrOCR for Medieval HTR: A Systematic Ablation Study with Cross-Dataset Validation

Fine-tuning transformer-based handwritten text recognition (HTR) models on medieval manuscripts is challenging because these models are pre-trained on modern text and must adapt to a very different visual domain. This paper studies how three controllable fine-tuning choices (contrast normalization, data augmentation, and layer freezing) affect recognition accuracy when adapting TrOCR to small historical datasets. We run controlled experiments on a 13th-century Italian manuscript (I-CT 91 "Cortonese") and replicate the same experimental grid on the public READ-16 benchmark as robustness evidence. On Cortonese, our best configuration achieves 8.03% character error rate (CER). Statistical comparisons across 13 configurations show that freezing up to three encoder layers or six decoder layers does not significantly harm accuracy, while deeper freezing becomes progressively detrimental. Removing contrast normalization (CLAHE) yields 7.84% CER, comparable to a domain-specialized baseline, suggesting strong optimization can reduce reliance on image preprocessing. Cross-dataset validation on READ-16 shows that decoder freezing thresholds transfer more robustly than encoder thresholds, and combined freezing strategies require dataset-specific re-validation. Finally, we use Grad-CAM gradient attributions and decoder cross-attention maps to diagnose error patterns and failure modes revealed by the ablations. Source code is available at https://github.com/LaudareProject/TrOCR-analysis

12.
arXiv (CS.LG) 2026-06-12

$\alpha$-fair heterogeneous agent reinforcement learning

arXiv:2606.13076v1 Announce Type: cross Abstract: Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every agent benefits from cooperation, many current algorithms - including those utilizing reward shaping - break the stationarity of Markov Games or lack rigorous theoretical guarantees. This creates a critical gap between fair objective methods and theoretically safe learning frameworks. We propose a novel framework that bridges $\alpha$-fairness with Heterogeneous-Agent Trust Region Learning (HATRL), ensuring monotonic improvement and convergence toward Nash Equilibria. Our approach leverages a fair advantage function that dynamically weights agent utilities based on their expected returns, allowing the global objective to transition from purely utilitarian efficiency to $\alpha$-fairness welfare based on the parameter $\alpha$. We introduce two practical algorithms, $\alpha$-fair HATRPO and $\alpha$-fair HAPPO, and demonstrate through experiments in sequential social dilemmas like CleanUp and CommonHarvest that they perform better than HATRL's algorithms from a utilitarian point of view while achieving socially higher outcomes.

13.
arXiv (CS.CL) 2026-06-12

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

The advancement of Large Reasoning Models (LRMs) has catalyzed a paradigm shift from reactive ``fast thinking'' text generation to systematic, step-by-step ``slow thinking'' reasoning, unlocking state-of-the-art performance in complex mathematical and logical tasks. However, the field faces the fundamental gap between token-level behavioral analysis and internal reasoning mechanisms, and the instability of reinforcement learning (RL) for reasoning optimization relying on costly external verifiers. We identify and formally define Entropy-Gradient Inversion, a robust negative correlation between token entropy and logit gradients that acts as a definitive geometric fingerprint for LRM reasoning capability. Building on this, we propose Correlation-Regularized Group Policy Optimization (CorR-PO), which embeds this inversion signature into RL reward regularization. Extensive experiments on various reasoning benchmarks across multiple model scales show CorR-PO consistently outperforms state-of-the-art baselines, confirming that stronger inversion directly correlates with superior reasoning performance.

14.
arXiv (CS.CL) 2026-06-12

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We extend this analysis to all six decoder layers of the Fairseq DE-EN model ($N=3{,}414$), showing that Wass-to-Unif and Wass-to-Data are complementary detectors specialised across hallucination types, that detection is concentrated in layers L1–L4 with L5 anti-predictive for subtler types, and that hallucinated translations lack the exploratory attention phase present in correct translations from the first decoding step. We further evaluate whether the geometric signal transfers to abstractive summarization faithfulness detection: our unsupervised OT detector on AggreFact ($N=1{,}116$) achieves $57.2\%$/$57.6\%$ balanced accuracy on CNN/XSum – above chance but substantially below supervised MiniCheck-Flan-T5-L($69.9\%$/$74.3\%$). This gap is principled: unlike NMT hallucinations, unfaithful summaries can attend correctly to source tokens while misrepresenting their content, a failure mode invisible to concentration-based OT metrics by construction. Structural experiments on T5-base confirm consistent decoder organisation across depth, with Layer~3 showing peak concentration and Layer~12 being most critical for generation quality. Together, the results establish OT on cross-attention as a reliable detector when the failure mode is source disengagement, a principled interpretability tool regardless of task, and fundamentally limited when faithfulness failures occur downstream of attention.

15.
arXiv (CS.AI) 2026-06-19

Exploring Feature Extraction Technique Parameters for Acoustic Gunshot Classification

arXiv:2606.19568v1 Announce Type: cross Abstract: Acoustic gunshot detection is a problem with applications across civilian public safety, military operations, and wildlife conservation, yet the field lacks a rigorous exploration of feature extraction techniques with a focus on generalization to realistic data. The mixed effectiveness of commercial gunshot detection and classification systems indicates an open problem that is not adequately addressed by the current literature. In this paper, we present a systematic investigation of common feature extraction techniques using a dataset of 23,000 gunshot recordings across 85 firearms and 21 calibers. We benchmark three feature extraction techniques with 12 total unique parameter sets using ResNet-18. Our results demonstrate that using the correct feature extraction technique can improve top-1 accuracy by up to 20%, and utilizing the correct parameters for a given feature extraction technique can improve that value by up to 4.7%.

16.
arXiv (CS.AI) 2026-06-16

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

作者:

arXiv:2605.00074v2 Announce Type: replace-cross Abstract: DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: $k$-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies $\mathbb{E}[\mathrm{FNR}] \le \alpha + \mathrm{TV}$, where the additive term is the calibration-to-test distribution shift under family holdout (a certified ceiling of 24-49% across folds). Across ten leave-one-taxonomic-family-out folds at $\alpha=0.05$ on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% empirical test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack $1/(n_{\mathrm{cal}}+1)$ caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade $\alpha=10^{-3}$ requires an $18\times$ larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms. Code: https://github.com/najmulhasan-code/crc-screen

17.
Nature Medicine 2026-06-12

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

Specialized clinical artificial intelligence (AI) tools are entering medical practice despite scarce independent evaluation. We quantitatively evaluate two clinical AI tools, OpenEvidence and UpToDate Expert AI, built on large language models (LLMs) against three frontier LLMs: GPT-5.2, Gemini 3.1 Pro and Claude Opus 4.6. Our evaluation has three stages: (1) 500 MedQA questions testing medical knowledge, (2) 500 HealthBench items measuring alignment with clinicians and (3) the real clinical queries (RCQ) benchmark, built from 100 de-identified queries from physicians to a general-purpose language model in a live clinical environment. For the RCQ benchmark, 12 US clinicians performed randomized, blinded review of model outputs, producing 1,800 model–question annotations. Frontier LLMs outperformed clinical AI tools in all three evaluations. Clinical AI tools performed comparably to auto-enabled Google Search AI Overview on the RCQ. These findings highlight the need for independent, real-world evaluation of AI tools before they enter clinical settings. In an independent evaluation, frontier large language models outperformed specialized clinical artificial intelligence tools on medical knowledge, clinician alignment and real-world clinical queries.

18.
arXiv (CS.LG) 2026-06-16

Dual-Network PINNs for Optimal Control: A Reproducible Benchmark on the Mass-Spring-Damper System

arXiv:2606.15271v1 Announce Type: cross Abstract: This work presents a transparent and reproducible benchmark study of a direct dual-network Physics-Informed Neural Network (PINN) formulation for the optimal control of a mass-spring-damper system. The classical linear-quadratic optimal control problem is solved by two independent classical methods – Pontryagin's Minimum Principle with single shooting, and direct transcription through trapezoidal collocation – and recast as a constrained optimization problem solved by two feedforward neural networks: a state network whose boundary conditions are enforced exactly through a composite cubic-and-mask ansatz, and an unconstrained control network. The composite loss combines the physics residual at the collocation points with a trapezoidal approximation of the cost functional, weighted by a single scalar hyperparameter. On the benchmark considered, the PINN reproduces the classical optimal cost to four significant digits, satisfies the terminal state constraints exactly by construction, and produces pointwise state and control errors that fall within the spread of the two classical references. Training is approximately two orders of magnitude slower than classical shooting on this benchmark, which is honestly reported. The contribution is methodological clarity rather than methodological novelty: the formulation and the accompanying Google Colab implementation are intended to lower the barrier to entry for practitioners exploring PINN-based optimal control without prior exposure to adjoint methods or two-point boundary value problems.

19.
arXiv (CS.LG) 2026-06-16

Beyond the Smile: A Hybrid Convolutional VAE for Crypto Volatility Surfaces

arXiv:2606.16961v1 Announce Type: new Abstract: We present a convolutional variational autoencoder for cryptocurrency implied-volatility surfaces, together with a deployable predictor that combines it with a quadratic smile re-fit through a deterministic per-tenor routing rule. Trained on 6,034 fully-filled hourly Binance Options surfaces of BTC and ETH spanning May-October 2023 and parameterised on a common $6 \times 7$ tenor-delta grid, the model attains a hidden-cell surface-completion RMSE in the 0.94-1.56 vol-point range across both markets and mask rates 10-50%. The hybrid predictor attains 0.83 vol points at 50% masking against 7.00 for the smile re-fit alone, an eightfold reduction obtained at no additional inference cost. Under structurally-correlated hole patterns that emulate the withdrawal of an entire tenor of strikes, the smile re-fit incurs 9.6-13.1 vol points of error while the learned model remains at 1.5-1.9, isolating a regime in which the generative model is the only viable predictor. Joint training on BTC and ETH improves the in-distribution model on both markets by 9-27% relative to the better-performing single-symbol counterpart, indicating a substantially shared vol-surface manifold across the two largest cryptocurrencies over the observation window. The hybrid is calendar- and butterfly-arbitrage-free at the listed strikes, a property that the parametric smile re-fit alone fails at high mask rates. The per-snapshot reconstruction error of the trained model flags the late-October ETF-anticipation rally and the August $17$, $2023$ flash crash as elevated-error periods without supervision. All training and evaluation infrastructure is released to support reproducible follow-on work.

21.
bioRxiv (Bioinfo) 2026-06-19

FeatureMSEA: Metabolic Feature-based Metabolite Set Enrichment Analysis

Liquid chromatography-mass spectrometry (LC-MS) untargeted metabolomics detects thousands of metabolic features, but converting these chemical signals into metabolite set-level biological knowledge remains challenging. This is because most features lack unambiguous metabolite identities. Conventional metabolite set enrichment analysis (MSEA) generally requires identified metabolites and metabolite-level ranked inputs, leaving much of the untargeted feature space unused. Here, we present FeatureMSEA, a feature rank-based framework for metabolite set enrichment directly from metabolic features with ambiguous annotations. FeatureMSEA integrates multi-evidence feature-to-metabolite annotation, feature rank-based enrichment scoring, permutation-based inference, and iterative leading-edge-guided annotation refinement, with an optional LLM-assisted module for post-enrichment interpretation. In null comparisons of randomly split healthy samples, FeatureMSEA detected no significant metabolite sets, whereas metabolite-set spike-in simulations showed recovery of implanted signals. In a cerebrospinal fluid metabolomics study of Huntington's disease, FeatureMSEA identified dysregulated metabolite sets related to amino acid metabolism, mitochondrial energy metabolism, and neuroactive signaling. MS/MS-based annotation analysis further showed that FeatureMSEA refinement reduced annotation ambiguity and prioritized chemically consistent candidate metabolites. In summary, FeatureMSEA provides a general framework for extracting metabolite set-level biological insights from LC-MS untargeted metabolomics in which confident metabolite identification remains incomplete.

22.
arXiv (CS.LG) 2026-06-24

A Private Approximation of the 2nd-Moment Matrix of Any Subsamplable Input

arXiv:2505.14251v2 Announce Type: replace Abstract: We study the problem of differentially private second moment estimation and present a new algorithm that achieve strong privacy-utility trade-offs even for worst-case inputs under subsamplability assumptions on the data. We call an input $(m,\alpha,\beta)$-subsamplable if a random subsample of size $m$ (or larger) preserves w.p $\geq 1-\beta$ the spectral structure of the original second moment matrix up to a multiplicative factor of $1\pm \alpha$. Building upon subsamplability, we give a recursive algorithmic framework similar to Kamath et al 2019, that abides zero-Concentrated Differential Privacy (zCDP) while preserving w.h.p. the accuracy of the second moment estimation upto an arbitrary factor of $(1\pm\gamma)$. We then show how to apply our algorithm to approximate the second moment matrix of a distribution $\mathcal{D}$, even when a noticeable fraction of the input are outliers.

23.
arXiv (CS.LG) 2026-06-11

Re-evaluating Confidence Remasking in Masked Diffusion Language Models

arXiv:2606.12232v1 Announce Type: new Abstract: Masked diffusion language models (dLLMs) have recently emerged as a competitive alternative to autoregressive language models, with the promise of faster inference via parallel token generation. A notable limitation of the masked formulation, however, is that once a token has been unmasked it can no longer be revised, leaving dLLMs vulnerable to early sampling mistakes. To address this, a growing body of work has sought to extend masked dLLMs with self-correcting (remasking) capabilities. One appealing subset of these methods does so in a training-free, post-hoc manner based on token confidences, with encouraging early reported results. In this work, we revisit the empirical evaluation of a representative post-hoc remasking method, WINO [Hong et al., 2026], and find that under standard decoding settings (shorter block lengths) it brings little-to-no benefit over confidence-based unmasking alone [Wu et al., 2025]. Extending the evaluation to non-greedy decoding, we find that while confidence-based remasking can mitigate errors introduced by increased stochasticity to some extent, it also exacerbates the diversity collapse previously reported for confidence-based unmasking. Overall, our results show that the benefits of post-hoc confidence-based remasking are highly setting-dependent, underscoring the need for a more comprehensive evaluation framework.

24.
medRxiv (Medicine) 2026-06-10

Developing a Unified Criminal Justice Pathway into Drug and Alcohol Treatment from Police Custody: A Public Health Service Evaluation and Pathway-Design Project in Blackpool, United Kingdom

Introduction: Blackpool, England's most deprived local authority, has the highest drug-related death rate in the country. People in police custody with problem substance use are a key Core20PLUS5 inclusion-health group, yet referral from the police into structured drug and alcohol treatment is fragmented and relies heavily on self-report. We evaluated the current police-to-treatment route in Blackpool and designed an evidence-informed unified pathway. Materials and Methods: A mixed-methods service evaluation and pathway-design project was conducted during a six-month General Practice / Public Health rotation. Routinely collected referral data from Horizon (the local specialist drug and alcohol service) covering the 47-month period from December 2019 to October 2023 were analysed. Findings were triangulated with national policy, the Project ADDER and Liaison and Diversion evaluations, and the international evidence on police-led pre-arrest diversion. Results: Of 5,900 total referrals into Horizon over 47 months, only 269 (4.56%) originated from the police. Police referrals accounted for fewer than 5% of monthly referrals in 30 of 47 months, for 5 to 9.9% in 16 months, and for >/= 10% in only one month (10.8%, December 2022). Blackpool recorded 76 drug-misuse deaths in 2019-21 (19.4 per 100,000, approximately four times the England rate). A six-step unified pathway is proposed: Initiate Referral (opt-out, from ADDER Police and Liaison and Diversion); Initial Assessment; Tailored Treatment Plan; Continuous Support; Collaboration and Monitoring; and Evaluation and Adjustment. Conclusions: Police contact is markedly under-used as a gateway to treatment despite Blackpool having the highest drug-related mortality in England. An opt-out, multi-agency pathway anchored in Core20PLUS5 has the potential to narrow the treatment gap, reduce re-offending, and address the structural health inequalities that drive premature mortality.

25.
arXiv (CS.CL) 2026-06-11

Debiasing Without Protected Attributes: Latent Concept Erasure from Textual Profiles

Most fairness research in NLP assumes direct access to protected attributes such as gender, race, or nationality. In practice, however, such information is often unavailable due to privacy constraints, missing metadata, or legal restrictions, even though models may infer it from indirect textual cues. This raises a key question: can debiasing succeed without direct access to sensitive attributes? We propose H-SAL, which performs post-hoc concept and attribute erasure using self-description text as an implicit debiasing signal. To support this setting, we introduce a multi-domain Stack Exchange-based fairness benchmark for helpfulness prediction that includes both explicit and implicit signals, enabling comparison between standard debiasing with protected labels and debiasing without access to sensitive information. Across encoder and decoder-only language models, we find that implicit self-description often matches or outperforms explicit-label-based debiasing. Our results broaden representation-level fairness research and provide a new benchmark for studying debiasing under realistic data constraints.