Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
Nature Medicine 2026-06-11

Microglia at a key inflection point in Alzheimer’s disease

作者: 未知作者

We analyzed brains from octogenarians and cognitively resilient centenarians to understand why some individuals with substantial Alzheimer’s disease pathology develop dementia whereas others remain cognitively intact. Spatial transcriptomics revealed gene expression changes in discrete tissue domains surrounding amyloid plaques and tau pathology that distinguish early, clinically silent, disease from later stages associated with cognitive decline.

02.
arXiv (CS.AI) 2026-06-18

ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection

arXiv:2606.19079v1 Announce Type: new Abstract: The increasing deployment of parameter-efficient fine-tuning (PEFT) has led to model ecosystems in which a single backbone is paired with many task-specialized adapters. In this setting, inference-time queries often arrive without task labels, requiring the system to automatically select the most appropriate adapter from a growing and heterogeneous adapter pool. Existing routing methods either depend on access to adapter internals, such as weight decompositions or gradient-based statistics, or require additional router training, which limits scalability and portability as new adapters are added. We introduce ARIADNE, a training-free, adapter-agnostic routing framework for dynamic adapter selection at inference time. ARIADNE represents each adapter through a set of centroids computed from embeddings of its training set, capturing the data distribution associated with that adapter. Given an unlabeled input, it selects an adapter by measuring proximity to these centroids in latent space. Because routing is performed entirely in the input embedding space, ARIADNE is compatible with arbitrary PEFT methods and requires no modification to the adapters or training procedures. Primarily evaluated with Llama 3.2 1B Instruct on 23 diverse NLP tasks, ARIADNE recovers 97.44% of the upper bound performance. Scaling to 44 tasks, it achieves 89.7% average selection accuracy, without additional training or access to adapter internals.

03.
arXiv (math.PR) 2026-06-18

Geometric obstructions to Lipschitz transport between weighted Hessian $\mathrm{CD}(\kappa,\infty)$ manifolds

arXiv:2606.11085v2 Announce Type: replace Abstract: We construct a weighted Riemannian manifold $(\mathbb R^2,g,\mu)$ satisfying $\mathrm{CD}(1/2,\infty)$, the curvature-dimension condition, with the following property: if $\gamma$ denotes a centered Gaussian measure on $\mathbb R^2$, then there is no Lipschitz map $T:(\mathbb R^2,\|\cdot\|) \to (\mathbb R^2,g)$ satisfying $T_\#\gamma=\mu$. Building on this, we prove a Weyl-type asymptotic law for the eigenvalues of the weighted Laplacian $-\Delta_{g,\mu}$ and show that they are asymptotically negligible when compared to the eigenvalues of $-\Delta_{\gamma}$. These results give strong counterexamples to two questions of E. Milman and complement the recent counterexample of Aryan.

04.
arXiv (CS.AI) 2026-06-19

Bistable by Construction: Wall-Clock-Calibrated State Monitors Have No Moment-Detection Regime at Agent Cadence

arXiv:2606.19386v1 Announce Type: cross Abstract: Runtime monitors for autonomous agents commonly threshold an accumulated internal state - a behavioural baseline, a drift statistic, or, in our prior work, a modelled affective state. We previously reported a State Saturation Trap: threshold-on-state triggers over a continuous affect engine become near-constant alarms on SWE-bench debugging agents (Modgil 2026). A post-release audit found the engine received dt=0 between actions, so its exponential decay never operated: the published trap is a pure-accumulator result. We correct the record (erratum, v2) and treat the flaw as an experiment. The key variable it exposes is whether a monitor's dynamics are calibrated in sample time (per observation, as in CUSUM) or wall-clock time (half-lives in seconds, as in affect models and EMA baselines). On fixed-rate streams these coincide; on agent streams, where inter-action time varies by orders of magnitude, they do not. A pre-registered sweep over uniform intervals (dt in {0..600}s) on 20 trajectories shows the wall-clock level trigger has two regimes: at dt=60s silent. Every critical dt lies in (1,30]s. Real agent runs measure latency at median 1.53s (p90 2.33s); real coding cadence sits inside the trap regime, vindicating the empirical finding under a corrected mechanism. The structure is a property of the calibration class, not the engine: a minimal wall-clock accumulator over the raw error stream reproduces the same cliff, while a sample-time CUSUM over the identical stream is exactly dt-invariant (20/20). A rising-edge trigger with hysteresis fires 0-3 times per trajectory in every condition. We conclude that wall-clock-calibrated leaky-integrator monitors admit no regime in which they act as moment detectors on agent streams; transition detection escapes the trap at every cadence, but does not recover human intervention timing.

05.
arXiv (CS.CL) 2026-06-15

OLaPh: Optimal Language Phonemizer

Phonemization is a critical component in text-to-speech synthesis. Traditional approaches rely on deterministic transformations and lexica, while neural methods offer potential for higher generalization on out-of-vocabulary (OOV) terms. We introduce OLaPh (Optimal Language Phonemizer), a hybrid framework that integrates extensive multilingual lexica with advanced NLP techniques and a statistical subword segmentation function. Evaluations on the WikiPron benchmark show OLaPh significantly outperforms established baselines in overall accuracy and maintains robustness on OOV data through advanced fallback mechanisms. To further explore neural generalization, we utilize the framework to synthesize a high-consistency training corpus for an instruction-tuned Large Language Model (LLM). While the deterministic framework remains more accurate overall, the LLM demonstrates strong generalization, matching or partly exceeding the framework's performance. This suggests that the LLM successfully internalized phonetic intuitions from the synthetic data that transcend the framework's capabilities. Together, these tools provide a comprehensive, open-source resource for multilingual grapheme-to-phoneme conversion (G2P) research.

07.
arXiv (CS.CV) 2026-06-16

Near–Real-Time Conflict-Related Fire Detection in Sudan Using Unsupervised Deep Learning

Ongoing armed conflict in Sudan highlights the need for rapid monitoring of conflict-related fire-affected areas. Recent advances in deep learning and high-frequency satellite imagery enable near–real-time assessment of active fires and burn scars in war zones. This study presents a near–real-time monitoring approach using a lightweight Variational Auto-Encoder (VAE)–based model integrated with 4-band Planet Labs imagery at 3 m spatial resolution. We demonstrate that these impacted regions can be detected within approximately 24 to 30 hours under favorable observational conditions using accessible, commercially available satellite data. To achieve this, we adapt a VAE–based model, originally designed for 10-band imagery, to operate effectively on high-resolution 4-band inputs. The model is trained in an unsupervised manner to learn compact latent representations of nominal land-surface conditions and identify burn signatures by quantifying changes between temporally paired latent embeddings. Performance is evaluated across five case studies in Sudan and compared against cosine distance, CVA, and IR-MAD using precision, recall, F1-score, and the area under the precision-recall curve (AUPRC) computed between temporally paired image tiles. Results show that the proposed approach consistently outperforms the other methods, achieving higher recall and F1-scores while maintaining viable precision in highly imbalanced fire-detection scenarios. Experiments with 8-band imagery and temporal image sequences yield only marginal performance gains over single 4-band inputs, underscoring the effectiveness of the proposed lightweight approach for scalable, near–real-time conflict monitoring.

08.
arXiv (CS.CL) 2026-06-17

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward few-shot In-Context Learning (ICL), it is often presumed that insights from fine-tuning carry over unchanged. Yet this assumption has not been rigorously evaluated, leaving open the question of how to choose source languages for cross-lingual ICL. We conduct a broad empirical study of cross-lingual transfer in ICL spanning seven tasks, six models, and a typologically diverse set of languages. We further analyze language confusion, a key obstacle for generative tasks in cross-lingual ICL. Our results show that conventional fine-tuning-based expectations do not consistently apply in the ICL regime and point to alternative heuristics for selecting source languages effectively.

09.
arXiv (CS.AI) 2026-06-19

IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows

arXiv:2606.19595v1 Announce Type: cross Abstract: Voice agents deployed in structured workflows (customer service, healthcare scheduling, account management) must handle frequent user interruptions while maintaining progress through multi-step procedures. Existing benchmarks for speech-capable models focus on the timing of interruptions: barge-in detection, endpointing, and turn-taking dynamics. They leave unmeasured what happens after the interruption: does the agent resume the workflow at the correct step? Does it address the user's interjection? Does it avoid re-delivering content the user already heard? We introduce IHBench (Interruption Handling Benchmark), a benchmark that evaluates post-interruption recovery in voice agents executing state-machine-driven workflows across 10 enterprise domains. Six interruption types are injected at controlled points mid-utterance, with per-interruption evaluation rubrics generated alongside the data. Each interruption is scored on two axes: task fulfillment and recovery quality. We evaluate 27 audio-language model configurations from OpenAI, Google, and the open-weight community. Models vary widely, and recovery quality depends strongly on the interruption type. Across our experiments, closed-weight models are consistently more robust to interruptions than open-weight ones: they win far more often on task fulfillment, degrade roughly 3.3x more slowly as conversations grow longer, and show no audio-versus-text modality gap, whereas the open-weight models lose ground on all three. A human study validates the LLM judge against human annotators, and a cross-benchmark analysis against AudioMultiChallenge indicates that recovery quality is a largely distinct capability axis.

10.
arXiv (CS.CV) 2026-06-17

BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM

The rapid advancement of generative AI has substantially improved image and video synthesis, amplifying the risk of multimodal visual misinformation. Recent MLLMs have shown promise for transparent AI-generated content detection through reasoning and explanation, yet existing approaches largely treat image and video forensics as isolated tasks, leaving cross-modal synergies underexplored. To address this, we present BusterX++, a unified MLLM for joint image and video detection with interpretable reasoning. We also introduce GenBuster-Bench++, a meticulously curated, difficulty-aligned benchmark containing balanced image and video samples spanning recent generation models and diverse real-world scenarios. Using this controlled setting, we revisit the widely adopted $SFT \rightarrow RL$ post-training paradigm. Notably, our findings demonstrate that a single-stage, pure RL strategy driven strictly by sparse outcome rewards consistently matches or surpasses a strong SFT+RL baseline across both unified and single-modality settings. Our key insight reveals that SFT imposes lower policy entropy, which restricts the policy search space and dampens exploratory freedom. In contrast, single-stage pure RL maintains higher policy entropy throughout training, effectively unlocking the spontaneous emergence of cross-modal capability transfer between image and video forensics. Extensive experiments demonstrate that BusterX++ achieves state-of-the-art performance, highlighting the powerful potential of RL for unified cross-modal visual reasoning.

11.
arXiv (CS.AI) 2026-06-24

RetiSEM: Generalising Causal Models for Fragmented Biomedical Data

arXiv:2606.24488v1 Announce Type: cross Abstract: Learning causal models from fragmented biomedical data is challenging because clinical, molecular, and imaging variables are often incomplete or not jointly observed. We propose RetiSEM, a domain-constrained structural equation modelling (SEM) framework for causal graph recovery and mediation analysis under limited multimodal resources. This proposed work organises variables into biologically informed blocks, applies forbidden-edge constraints, and decomposes pathway-level effects into TE, NDE, and NIE components. We evaluate RetiSEM across ten synthetic benchmark scenarios that vary in dimensionality, nonlinearity, causal depth, and pathway structure, together with a fragmented real-world setting that combines NHANES clinical variables with externally derived retinal representations. This approach achieves lower structural error and higher causal accuracy than unconstrained baselines across the synthetic benchmarks. In the real-data analysis, retinal variables behave mainly as downstream biomarker-like indicators, with smaller but detectable indirect effects. These findings support our strategy as an interpretable framework for testing structured causal hypotheses in limited-resource biomedical AI. The code and resources for this work are publicly available at: https://github.com/Inamullah-Colab/ReitSEM.

12.
arXiv (CS.LG) 2026-06-17

Bounded Difference Concentration for Infinitely Exchangeable Sequences with Applications to AI Benchmark Uncertainty

arXiv:2606.17426v1 Announce Type: cross Abstract: We consider the concentration properties of functions of infinitely exchangeable random variables. By conditioning on the de Finetti directing measure, we show that the deviation of any function with bounded-difference constants $c_1, \dots, c_n$ decomposes into a conditional sampling fluctuation and a latent mixture fluctuation. When this latent mixture is $\sigma_{\mathrm{mix}}^2$-subgaussian, we establish a concentration inequality with an effective variance proxy of $\frac{1}{4}\sum_i c_i^2 + \sigma_{\mathrm{mix}}^2$. Crucially, we demonstrate that for zero-sum linear contrasts, such as the difference between a subsample mean and a full population mean, the latent mixture term cancels exactly. This cancellation yields a tight, mixture-free Hoeffding-type bound that provides a direct de Finetti mechanism for the infinite-extendibility limit of recent finite-exchangeable concentration results. We apply this framework to quantify uncertainty in composite AI benchmarks, such as MMLU, where question items naturally exhibit exchangeable dependence across domains. Our results provide both a domain-stratified hierarchical model for bounding the uncertainty of accuracy scores, and a distribution-free, cost-saving statistical guarantee for accurately estimating full benchmark scores from random subsets.

13.
arXiv (quant-ph) 2026-06-11

Observable signatures of exceptional points from left-right eigenstate distinction

arXiv:2606.11333v1 Announce Type: new Abstract: Non-Hermitian quantum systems exhibit qualitatively distinct physical behavior compared to Hermitian systems, a prime example being spectral singularities known as exceptional points. Their relevance in, e.g., quantum sensing, unidirectional transport, and robust lasing makes it important to be able to identify exceptional points through observable features of a many-body system. Here, using as an example a one-dimensional complex XY spin chain realizing both rotation-time RT- and parity-time PT-symmetric regimes, we develop a framework for detecting exceptional points based on the distinction between left and right eigenvectors of the Hamiltonian, which in a non-Hermitian system are no longer the adjoint of each other. We first show that a global measure constructed from the difference between the Hamiltonian and its adjoint locates exceptional points via distinct non-analytic behavior. At the level of observables, differences in local spin correlations evaluated on the right and left eigenstates provide a reliable static detection scheme. In contrast, static bipartite entanglement measures fail to capture this distinction, urging us to study the quantum dynamics of the model. Following a sudden quench, we demonstrate that the time-averaged right-left entanglement entropy difference directly encodes signatures of the exceptional point. In the RT-symmetric regime, it exhibits a pronounced peak at the exceptional point, whereas in the PT-symmetric regime it behaves as an order-parameter-like quantity, remaining finite in one phase and vanishing at the transition. Our results establish a direct link between the structure of non-Hermitian eigenstates and observable signatures of exceptional points, providing a practical route to identify them in existing quantum simulators.

14.
arXiv (CS.AI) 2026-06-12

Representing Time Series as Structured Programs for LLM Reasoning

arXiv:2606.12481v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong reasoning and instruction-following capabilities, making them potentially powerful tools for time-series analysis. However, time series lie outside their native textual modality, raising a fundamental question: how should time series be represented so that LLMs can reason about them effectively? Existing work typically serializes raw numerical sequences or fine-tunes pre-trained LLMs on time-series data. These approaches place the burden of extracting temporal structure directly on the LLM, creating a modality mismatch that often degrades performance on long sequences and introduces substantial computational overhead. In this work, we introduce Time-Series-to-Structured-Program representation (T2SP), a deterministic, training-free method that represents a time series as a structured symbolic program. T2SP decomposes time series into trends, periods, and salient events, expressing them in a program-friendly format aligned with the textual and code-like modalities on which LLMs are natively trained. By shifting temporal-structure extraction from the model to the representation itself, T2SP enables off-the-shelf LLMs to leverage their existing reasoning capabilities for time-series understanding. We evaluate T2SP on three reasoning tasks – editing, captioning, and question answering – where it consistently improves performance, reduces reasoning time, and lowers failure rates compared with raw-string representations. Our results demonstrate that T2SP provides an effective interface between time series and LLMs.

15.
arXiv (quant-ph) 2026-06-19

Hybrid VQE-CVQE algorithm using diabatic state preparation

arXiv:2512.04801v2 Announce Type: replace Abstract: We propose a hybrid variational quantum algorithm that has variational parameters used by both the quantum circuit and the subsequent classical optimization. Similar to the Variational Quantum Eigensolver (VQE), this algorithm applies a parameterized unitary operator to the qubit register. We generate this operator using diabatic state preparation. The quantum measurement results then inform the classical optimization procedure used by the Cascaded Variational Quantum Eigensolver (CVQE). We demonstrate the algorithm on a system of interacting electrons and show how it can be used on long-term error-corrected as well as short-term intermediate-scale quantum computers. Our simulations performed on IBM Brisbane produced energies well within chemical accuracy.

16.
arXiv (CS.AI) 2026-06-11

When Poison Fails After Retrieval: Revisiting Corpus Poisoning under Chunking and Reranking Pipelines

arXiv:2606.11265v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus poisoning attacks that manipulate downstream model outputs through malicious knowledge injection. Existing studies mainly evaluate poisoning under simplified retrieval settings, overlooking practical RAG pipelines involving document chunking, dense retrieval, reranking, and grounded generation. In this paper, we revisit corpus poisoning under realistic multi-stage retrieval pipelines and show that many existing attacks substantially degrade after reranking despite achieving high retrieval-stage relevance. We identify retrieval granularity mismatch as a key reason for this failure: document-level adversarial signals are often fragmented during chunking, while rerankers favor locally coherent and answer-bearing passages rather than globally optimized semantic similarity. Based on this observation, we propose Chunk-aware and Rerank-Consistent Poisoning (CRCP), a poisoning framework that jointly optimizes retrieval relevance, reranker consistency, and chunk-boundary robustness. CRCP explicitly models chunking transformations during optimization to generate locally self-contained adversarial passages that remain effective under varying chunking configurations. Experiments on standard RAG benchmarks with multiple retrievers and rerankers show that existing poisoning methods are highly sensitive to chunk size and reranking strategies, whereas CRCP achieves substantially higher attack success rates and stronger robustness across realistic retrieval pipelines. Our findings highlight an important realism gap in current RAG security evaluation and suggest that poisoning in modern RAG systems should be studied as a multi-stage retrieval consistency problem rather than a retrieval-only problem.

17.
medRxiv (Medicine) 2026-06-15

Toward a National Registry for Inborn Errors of Immunity in Peru: A Qualitative Implementation Study

Background: Peru lacks an integrated information system for patients with Inborn Errors of Immunity (IEI). Although disease registries are essential tools for data management and health planning, their success depends on implementation science approaches that account for local contextual factors. This study reports Phase I of a three-phase mixed-methods implementation project to design and develop a national IEI registry. Methods: Phase I consisted of a phenomenological qualitative study exploring stakeholder perspectives. Semi-structured focus groups and in-depth interviews were conducted with 29 key stakeholders across four groups: policy-makers, clinical experts, end-users (immunologists, residents, allied health personnel), and patient organization representatives. Interviews followed a guide structured around four a priori domains (structure, navigation, feasibility, and perception of existing systems). Discussions were conducted in Spanish, audio-recorded, transcribed verbatim, and coded using ATLAS.ti. A hybrid thematic analysis combining deductive and inductive coding was performed. Data elements proposed for the registry were triangulated with qualitative findings. Results: Thirty-six initial codes were consolidated into 15 categories, which were further integrated into four overarching themes conceptualized as pathways toward intention to use: (1) Environment, where governance, regulatory backing, and sustainable financing were identified as key enablers, while limited interoperability emerged as a structural barrier; (2) Technical Dimension, emphasizing usability, alignment with clinical workflow, and a hierarchical data architecture (demographic, clinical, therapeutic); (3) Users, highlighting clinical leadership, protected time, digital readiness, and perceived usefulness as stronger motivators than financial incentives; and (4) Patients, underscoring data protection, transparency, trust, and advocacy as essential for legitimacy and sustainability. Conclusions: A national IEI registry in Peru is perceived as necessary and feasible if implemented with strong regulatory foundations, interoperable design, robust data security, and user-centered architecture. These findings informed the development of an initial functional prototype and the operational plan for Phase II, focused on usability evaluation.

18.
arXiv (CS.LG) 2026-06-16

Bayesian Tensor Decomposition with Diffusion Model Prior

arXiv:2606.03212v2 Announce Type: replace Abstract: Low-rank tensor decomposition (TD) is usually effective on clean, fully observed data, but it often degrades under severe missingness or noise. Low-rankness is itself a useful but limited structural prior, and additional handcrafted priors (e.g., sparsity or smoothness) still fall short of capturing the rich statistics of real-world data. To compensate for this weak inductive bias under heavy corruption, one would like to inject a learned, data-driven prior; however, the state-of-the-art diffusion models are not readily compatible with current TD and tractable posterior inference. To address these challenges, we introduce DiffBCP, a hybrid-prior Bayesian CP decomposition framework that couples a cumulative shrinkage process prior over the CP factors for automatic rank selection with an off-the-shelf pre-trained diffusion model as an implicit data prior on the reconstructed tensor. To make posterior inference tractable despite the coupling among the likelihood, low-rank constraint, and diffusion prior, we develop a split Gibbs sampler: CP factors admit conjugate updates, while the diffusion block is sampled via low-rank-guided denoising. A noise-adaptive coupling schedule further reduces sensitivity to hand-tuned annealing. Experiments on image inpainting and denoising, including high-resolution out-of-distribution images, show consistent gains over Bayesian, nonlinear, and plug-and-play TD baselines.

19.
arXiv (math.PR) 2026-06-18

On two overlooked stick-breaking constructions of the normalized inverse Gaussian process

arXiv:2606.19306v1 Announce Type: new Abstract: We shed light on two alternative stick-breaking constructions of the normalized inverse Gaussian (NIG) random discrete distribution which appear to have been overlooked so far in the Bayesian nonparametric setting. The first is derived from a result in Aldous and Pitman (1998) for the conditional Brownian excursion partition, mixing over the local time at zero up to time one. The second arises as a particular case of a result in James (2013) for priors obtained by a random spatial and temporal change of the normalized generalized Gamma subordinator. Both constructions are in terms of straightforward transformations of standard random variables and can be easily generalized to provide the stick-breaking construction of any element, respectively, in a) the family of mixed Poisson-Kingman models driven by the $1/2$ stable Lévy measure and b) the family of Poisson-Gamma processes driven by the Inverse Gaussian subordinator.

20.
arXiv (CS.CV) 2026-06-18

Revealing Hidden Vulnerabilities in Autoencoders through Gradient Signal Restoration

Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize reconstruction damage, often converge to suboptimal perturbations, thereby potentially overstating AE robustness. We show that this limitation is linked to vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, associated with near-zero singular values in their intermediate weight matrices. To address this, we propose GRILL (Gradient Signal Restoration in Ill-Conditioned Layers), a framework designed to mitigate gradient degradation and improve the reliability of adversarial robustness evaluation in encoder-decoder architectures. GRILL is designed to mitigate adversarial gradient degradation during optimization, enabling attacks to better approximate high-distortion perturbations under fixed norm constraints. Through extensive experiments across multiple AE architectures, under both sample-specific and universal attacks, as well as standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, thereby exposing vulnerabilities hidden by existing attack limitations. Beyond AEs, we provide preliminary evidence that modern multimodal encoder-decoder architectures exhibit similar vulnerabilities.

21.
arXiv (CS.AI) 2026-06-12

HD-Prot: A Protein Language Model for Joint Sequence-Structure Modeling with Continuous Structure Tokens

arXiv:2512.15133v3 Announce Type: replace-cross Abstract: Proteins inherently possess a consistent sequence-structure duality. The abundance of protein sequence data, which can be readily represented as discrete tokens, has driven fruitful developments in protein language models (pLMs). A key remaining challenge, however, is how to effectively integrate continuous structural knowledge into pLMs. Current methods often discretize protein structures to accommodate the language modeling framework, which inevitably results in the loss of fine-grained information and limits the performance potential of multimodal pLMs. In this paper, we argue that such concerns can be circumvented: a sequence-based pLM can be extended to incorporate the structure modality through continuous tokens, i.e., high-fidelity protein structure latents that avoid vector quantization. Specifically, we propose a hybrid diffusion protein language model, HD-Prot, which embeds a continuous-valued diffusion head atop a discrete pLM, enabling seamless operation with both discrete and continuous tokens for joint sequence-structure modeling. It captures inter-token dependencies across modalities through a unified absorbing diffusion process, and estimates per-token distributions via categorical prediction for sequences and continuous diffusion for structures. Extensive results demonstrate that HD-Prot achieves competitive performance in unconditional sequence-structure co-generation, motif-scaffolding, protein structure prediction, and inverse folding tasks. Furthermore, our method can perform on par with state-of-the-art multimodal pLMs, despite being developed under limited computational resources (i.e., less than one-tenth the budget for modality extension fine-tuning). It highlights the viability of simultaneously estimating categorical and continuous distributions within a unified language model architecture, offering a promising alternative direction for multimodal pLMs.

22.
arXiv (quant-ph) 2026-06-17

Kinematic properties of the Pauli equation

arXiv:2606.17548v1 Announce Type: new Abstract: Based on the Wigner-Vlasov formalism, this paper investigates the kinematic properties of the Pauli equation. It is shown that the probability current associated with the Pauli equation can be represented as a superposition of two currents with certain expansion coefficients. Each of these currents corresponds to a particular component of the spinor. The expansion coefficients effectively serve as weighting functions that determine the probability contribution of the corresponding spinor component. Therefore, each spin projection corresponds to its own probability flux. A new system of the Hamilton-Jacobi equations and also a system of motion equations in electromagnetic fields are obtained, taking into account the interaction between the spin and the magnetic field. To illustrate how these equations can be applied we have investigated the quantum system kinematics in detail using an exact solution of the Pauli equation in the presence of a uniform magnetic field and an asymmetric quadratic potential.

23.
arXiv (CS.CL) 2026-06-19

Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

AI and large language models (LLMs) have emerged as promising tools to address global mental health challenges. Despite the global nature of these challenges, there remains a critical shortage of high-quality datasets for training and evaluating such systems. To mitigate this gap, researchers increasingly generate synthetic clinical personas to simulate user data and test digital mental health support systems. However, most validated personas rely on English-centric contexts. This paper investigates whether similar persona-based methods can be used to generate multilingual mental health datasets. We modified nationality and language parameters in personas to generate clinical dialogues in Mandarin, Bengali, and Hindi. We then examined how different LLMs perform when evaluating the depression severity of these generated multilingual datasets against the baseline in English. Our findings indicate that just adding nationality and language parameters in personas might not be adequate, as it can introduce clinical inconsistency across languages. LLM judge models often exhibit inaccuracies in assessing depression severity in non-English texts, with performance varying across different models. This exposes the systemic limitations of applying English-centric personas to multilingual contexts. Ultimately, our work highlights the urgent need for culturally responsive data generation to ensure equitable mental health systems globally.

24.
arXiv (CS.AI) 2026-06-15

Aligning Quantum Operators with Large Language Models

arXiv:2606.13811v1 Announce Type: cross Abstract: Can Large Language Models (LLMs) understand and reason about quantum operators? Despite their remarkable capabilities in mathematics and symbolic reasoning, LLMs remain inherently blind to quantum representations such as unitary matrices. In this work, we take a step toward bridging this gap by introducing an approach that maps unitary operators into the latent space of an LLM, enabling unified modeling over quantum and linguistic inputs. We instantiate this idea on Clifford+T circuit synthesis over a Pauli rotation gate set, where our model achieves results competitive with state-of-the-art methods and scales consistently with training data, with no signs of saturation. Our approach further enables language-conditioned synthesis, allowing gate constraints unseen during training to be specified directly in natural language. This work suggests a path toward quantum–aware foundation models that can natively interpret and reason about quantum operations, which could have broader implications reaching across quantum compilation and algorithm discovery.

25.
arXiv (CS.LG) 2026-06-19

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

arXiv:2510.19893v2 Announce Type: replace Abstract: Medical AI systems demonstrated impressive diagnostic performance, yet they routinely show uneven accuracy across demographic groups, disadvantaging underrepresented populations. Although multimodal reasoning foundation models have pushed clinical diagnosis forward, reinforcement learning-based post-training tends to absorb and magnify the biases present in majority-dominated training corpora. We propose Equitable Group Relative Policy Optimization (EQPO), a hierarchical reinforcement learning method that encourages balanced learning across heterogeneous clinical populations by adaptively reweighting samples according to subgroup representation, task difficulty, and data source. As demographic annotations are frequently missing in real-world clinical data, EQPO additionally applies unsupervised clustering to recover latent subpopulations when they are unavailable. On 7 diagnostic benchmarks covering 5 modalities (X-ray, CT, dermoscopy, mammography, ultrasound), EQPO reduces F1 standard deviation by 43.9% and the maximum cross-group F1 gap by 42.7% on QoQ-Med3-8B over vanilla GRPO, and narrows predictive parity gaps by 27.2% on MedGemma-4B over bias-mitigated RL baselines while raising F1 by 12.5% even without any demographic labels. Examining the training trajectory shows that EQPO steadily improves fairness over the course of optimization, in contrast to baseline methods whose fairness degrades as training proceeds, and the discovered implicit groups remain stable and align with masked demographic attributes. We further release EquiMedGemma-4B and EquiQoQ-Med3-8B, equitability-aware clinical VLLMs that attain state-of-the-art accuracy with markedly smaller demographic gaps.