Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
Nature (Science) 2026-06-17

Rock weathering can counteract river CO<sub>2</sub> emissions induced by permafrost thaw

Authors:

Climate-induced permafrost thaw unlocks large stores of organic carbon that are mineralized and emitted as carbon dioxide (CO2) from rivers to the atmosphere1. Concurrently, warming and permafrost thaw can increase mineral weathering rates, thus affecting the release and sequestration of inorganic carbon2–4. Yet how these biological and geological carbon cycles interact and jointly affect CO2 dynamics (emission compared with drawdown) in permafrost rivers remains unknown5. Here we combine CO2 emissions, organic and inorganic solute concentrations, dual carbon isotopes (δ13C–Δ14C) and geochemical modelling to infer how permafrost thaw may affect river biogeochemistry over decades to centuries across the Qinghai–Tibet Plateau. Leveraging a gradient of thermal permafrost degradation, we find that river CO2 emissions decline, whereas solute fluxes from rock weathering increase with decreasing permafrost cover. Across this region, net CO2 drawdown fluxes from rock weathering are about 35% of river CO2 emissions, varying from around 15% in catchments with continuous permafrost to more than 100% in catchments with discontinuous or isolated permafrost. Thus, carbon fluxes from chemical weathering may become increasingly important with ongoing permafrost thaw, potentially even outpacing river CO2 emissions. Our findings disentangle the interplay between biological and geological carbon fluxes that are important for the cryosphere and the global carbon cycle. Permafrost thaw on the Qinghai–Tibet Plateau increases rock-weathering rates while reducing river CO2 emissions, suggesting geological carbon fluxes may eventually outpace thaw-driven emissions.

02.
arXiv (CS.CL) 2026-06-16

XAI-Grounded Explanation Generation for Speech Deepfake Detection with Training-Free Multimodal Large Language Models

Speech deepfake detection (SDD) systems require trustworthy explanations for reliable decision-making. Existing explanation ways mainly fall into two categories. Traditional explainable AI (XAI), such as gradient-based attribution, produces low-level attribution signals tightly coupled with model decisions, and harder to be understood by human than natural language explanations. Meanwhile, large language model (LLM)-based explanation generation often produces generic and ungrounded descriptions due to the lack of heuristic evidence and task-specific supervision, stemming from limited grounded explanation datasets for SDD. We therefore propose a training-free explanation framework that integrates XAI evidence with multimodal LLMs to generate grounded and specific explanations. Using the PartialSpoof dataset, we construct a grounded explanation dataset and show that methods with XAI increase inside accuracy by over 45\%, verified through human evaluation and faithfulness checks.

03.
arXiv (CS.CV) 2026-06-16

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose \textsc{Light Forcing}, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., $1.2{\sim}1.3\times$ end-to-end speedup). Combined with other efficient solutions, \textsc{Light Forcing} further achieves a $2.0{\sim}3.0\times$ end-to-end speedup across diverse GPUs (e.g., 27.4\,FPS on RTX 5090 and 33.9\,FPS on H100). Code is released via this \href{https://github.com/chengtao-lv/LightForcing}{link}.

04.
bioRxiv (Bioinfo) 2026-06-18

A unified smoothing framework for protein domain bigram model

Biomolecular sequences can be represented as strings over an alphabet, an analogy that has motivated many applications of computational linguistic techniques to biological problems. However, such methods must be adapted to the characteristic scale and organization of biomolecular data. Here, we consider the problem of bigram smoothing for multidomain protein architectures, where domain bigram frequency data is extremely sparse and differs from textual data in alphabet size, string length distribution, the relationship between bigram and unigram frequencies, tandem repeat lengths, and the distribution of domain adjacencies. Moreover, some domain combinations are unobserved because they are biologically incompatible, others because the data are incomplete. A smoothing method that distinguishes these two cases is required. We propose a unified smoothing framework based on interpolation that can be tuned to accommodate different bigram data characteristics. Within this framework, we design specific model variants suited to protein domain bigram data: these assign low adjusted counts to pairs that are likely incompatible, while making appropriate adjustments for undersampled pairs. We demonstrate empirically that this approach distinguishes the two cases while preserving the characteristic signatures of multidomain data.

05.
arXiv (math.PR) 2026-06-16

Balanced affine Motzkin paths: Pearson geometry and global endpoint asymptotics

arXiv:2601.17634v2 Announce Type: replace Abstract: We study endpoint distributions of balanced affine weighted Motzkin paths. In the balanced case, the generating-function equation has Pearson-type characteristic geometry. We show that this geometry controls the terminal-height law globally: the characteristic escape time determines the limiting cumulant generating function, the large-deviation rate function, and the ray-scale asymptotics. Thus the usual Gaussian window is only the local quadratic approximation to a global Pearson-driven profile. For finite sizes, we prove a uniform Daniels saddlepoint approximation in the one-dominant-singularity regimes and identify the exceptional antipodal case requiring a lattice/interference correction.

06.
arXiv (CS.CL) 2026-06-12

A Context-Aware Dataset for Stance Detection in Bioethical Controversies on Reddit

Bioethical debates increasingly unfold on social media, yet stance detection research lacks large-scale, domain-specific resources for modeling such context-dependent discourse. We present BioStance, a context-aware dataset of 39,600 annotated Post-Comment pairs from Reddit bioethical discussions. BioStance covers six controversial targets across three dimensions of bioethical controversy: fundamental value conflicts, individual liberty versus collective responsibility, and technological uncertainty. Each instance preserves hierarchical conversational context and is labeled by three independent annotators using a three-class stance scheme: Favor, Against, and None. The annotations achieve a mean Krippendorff's $\alpha$ of 0.82, indicating substantial reliability. By combining thematic diversity, conversational structure, and high-quality human annotation, BioStance supports research on context-aware stance detection, argument mining, and computational analysis of bioethical discourse.

07.
arXiv (CS.CV) 2026-06-12

OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

We present OpenMedQ, a medical vision-language model pretrained on the broadest fully-open medical mix to date: 14 datasets totaling ~3.35M pretraining samples spanning pathology, radiology, microscopy, and text-only clinical QA. OpenMedQ reaches state-of-the-art BLEU-1 on PathVQA (75.9), beating Med-PaLM M variants up to 562B parameters (~80x larger), and matches the best reported VQA-MED BLEU-1 (64.5). Its vision encoder, transferred to 8 unseen medical classification benchmarks under an identical downstream recipe, obtains the highest average macro-F1 (0.757) among BiomedCLIP (0.745), PMC-CLIP (0.745), PubMedCLIP (0.746), and a from-scratch baseline (0.616). We release our code and an interactive demo is publicly available as a reproducible baseline for the community.

08.
PLOS Computational Biology 2026-05-29

A prototype-augmented graph representation learning framework for identifying brain disorder-associated genes and facilitating drug repurposing

Authors:

by Jiafang Li, Yifei Li, Siying Lin, Jiahua Rao, Huiying Zhao Many genetic loci were identified as associated with neuropsychiatric disorders and neurodegenerative disorders by Genome-wide association studies (GWAS). How these loci impact these diseases is unclear. Advances in deep-learning approaches and multi-omics data have the potential to link GWAS findings with disease mechanisms. Here, we proposed the Multi-omics Graph Transformer Network (MOGT), a semi-supervised graph neural network that leverages graph representation learning to model biological networks derived from multi-omics data to predict disease-associated genes. MOGT outperforms the current approaches in disease gene prediction for two psychiatric disorders and three neurodegenerative/neurological diseases. High-risk genes (HRGs) for Parkinson’s disease (PD) predicted by MOGT were used to drug discovery by integrating with the CMAP database. Finally, 10 drugs were identified as potential candidates. Among them, the effect of drug UK-356618 was experimentally verified in a primary neuron model, showing that UK-356618 reversed the abnormal expression of PD-associated genes and improved the cell-level phenotypes of PD. Together, these results indicate that MOGT can be used to identify HRGs for brain disorders, and these predicted HRGs provide high-level insights into the mechanisms and treatments of brain disorders.

09.
arXiv (CS.CL) 2026-06-19

Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models

Authors:

Aligned language models gate behaviors such as refusal and language routing through sparse feed forward neurons, yet no theory predicts when a single neuron intervention controls a behavior coherently rather than collapsing the output. We develop a budget normalized control window framework for single neuron steering. A dose along one write direction reduces to one control coordinate: the alignment between the residual stream and the write, driven along a universal saturation curve in units of a coherence budget set by the residual norm divided by the write norm. Coherent control exists when a behavior trigger lies below the collapse ceiling. The same coordinate governs benign mode switches and refusal; the ceiling follows from weights and one generic forward pass, while triggers are measured at rollout. On fifteen held out neurons, the predicted ceiling has mean absolute error 0.14, about 0.07 in bulk layers, and the committed open or closed verdict holds on eleven against a ten of fifteen majority baseline. Closed cases expose three failure modes rather than violations: collapse before trigger, too little depth to propagate, or a normalization that caps how far one neuron can push. The law explains why local gradient attribution anti predicts control: true controllers write off the readout axis and carry a near zero first order gradient. A forward only contrastive screen made precise by the window recovers controllers that attribution misses. On refusal, the hardest case, intervention success is typed, not scalar: coherent bypass and strict actionable reach separate, so a neuron can flip refusal in fluent, on task text with no actionable content, and genuine actionable reach appears only for three of six audited Llama pivots and only at later rollout horizons. Single neuron steering is therefore a budgeted, typed audit of controllability rather than a fixed dose anecdote.

10.
arXiv (CS.AI) 2026-06-19

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

arXiv:2606.20508v1 Announce Type: new Abstract: Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrations are not interchangeable: benign demonstrations can either reduce or increase harmful compliance depending on the model. We further show that preference optimization is the critical training stage that prevents benign demonstrations from increasing harmful compliance, that demonstration ordering exhibits strong recency bias, and that models differ in how refusal interacts with in-context learning: some adopt demonstrated formatting even when refusing, while others override all in-context signals upon refusal. Taken together, this work moves beyond showing that demonstration-based jailbreaking works to characterizing how it works: what models extract from compliance demonstrations depends on demonstration content, ordering, and training methodology.

11.
arXiv (CS.CV) 2026-06-12

What's Old is New Again: Classical Dimensionality Reduction for Efficient Saliency-Guided Biometric Attack Detection

Saliency-guided training is a paradigm in visual recognition that encourages models to focus on the most relevant image regions during learning. While its application in biometric presentation attack detection (PAD) has shown strong benefits in robustness and generalization, adoption is often limited by the high cost, domain specificity, and limited scalability of existing saliency acquisition methods, such as human annotations over a limited dataset. We present a novel, cost-efficient, and highly-scalable approach to saliency acquisition using maps inspired by classical dimensionality reduction techniques: PCA and LDA. Our proposed methods generate saliency maps directly from raw training data, requiring no human annotation nor domain knowledge. We contextualize the effectiveness of these saliency sources in three saliency-explored domains (iris PAD, synthetic face detection, fingerprint PAD) and demonstrate its scalability in two saliency-novel domains (fingerprint vein PAD and ID card PAD). Across all domains tested, models trained using dimensionality reduction-sourced saliency maps exceed baseline and sometimes SOTA saliency methods without any resource investment or domain-specific tooling. Our findings overcome an important yet unaddressed barrier to saliency-guided training for biometric attack detection and beyond.

12.
medRxiv (Medicine) 2026-06-18

Empirical Validation and Predictive Utility of the Perinatal Grief Scale in Men after Perinatal Loss

Background. The Perinatal Grief Scale (PGS) is a widely used instrument for assessing grief following pregnancy loss, yet no study has validated it specifically in men despite documented use in several studies. This gap is critical given fathers' persistent underrepresentation in perinatal bereavement research and the absence of empirically supported screening thresholds for this population. Methods. This cross-sectional validation study used data from the OPALE project (Observatory on PerinatAL hEalth) conducted by the CiaoLapo Foundation in Italy. Among 276 fathers who experienced stillbirth or miscarriage, we examined criterion validity by testing the association between PGS scores and trauma-related symptomatology assessed via three validated instruments: the Revised Impact of Event Scale (RIES, n=103), National Stressful Events Survey Short Scale (NSESSS, n=95), and SCL-90 (n=173). We systematically tested multiple threshold combinations to identify optimal discriminative performance. Results. The PGS demonstrated excellent criterion validity. The optimal threshold (PGS >=92) showed sensitivity 81.0%, specificity 81.8%, and Youden's J index 0.628. Fathers scoring >=92 had 19.12 times the odds of high trauma symptoms (95% CI: 9.35 to 39.14, p

13.
Nature (Science) 2026-06-17

Reimagining machine vision with optical computing

Authors: Unknown Author

A general-purpose artificial-intelligence vision system for use in image-sensing devices has been developed by embedding fundamentals of core computer-vision operations into a light-manipulating planar material called an optical metasurface. A prototype enables accurate, real-time perception and processing across diverse tasks, suggesting that this could be a solution for rapid, low-energy, on-device vision intelligence. A specialized ‘metasurface’ can preprocess incoming scene information on image-generating devices.

14.
arXiv (CS.CV) 2026-06-17

Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins

Text-to-video retrieval in operating rooms (OR) is an enabling technology for OR safety, as it allows stakeholders to retrieve and inspect recordings of specific events. However, because the most safety-critical events may not follow the common structure, to unlock its full potential text-to-video retrieval must be able to handle implicit queries that require reasoning to identify the right video (e.g., the step right before clipping). However, existing methods rely on global embeddings that cannot reason over such queries. We propose OR3, a text-to-video retrieval method that converts clips into action-driven digital twins (ActDTs), grouping concurrent subject-action-object triplets under non-overlapping temporal intervals. Moreover, rather than cross-modal matching through paired encoders, OR3 performs imagination-based retrieval where an LLM generates hypothetical ActDTs from queries. This enables intra-modal matching via a single encoder trained with ActDT-tailored hard negatives. Finally, evidence-grounded refinement revises imagined ActDTs based on discrepancies with top candidates to capture procedure-specific patterns. We construct a benchmark from MM-OR with 276 implicit queries across four reasoning categories over 386 clips from robotic knee procedures. OR3 achieves 57.6 R@1 and 77.3 R@5, outperforming the strongest baseline. These results demonstrate that OR3 enables fine-grained discrimination between visually similar OR video clips through temporal action reasoning.

15.
arXiv (math.PR) 2026-06-16

An Analytical Methodology for Quantifying Airspace Conflict Rate and Complexity

arXiv:2606.14897v1 Announce Type: cross Abstract: Air traffic growth, advanced air mobility, and increasingly autonomous operations are driving the need for scalable and adaptive airspace design methodologies. Central to this challenge is quantifying how traffic flow structure and demand, governed in part by airspace geometry, influence conflict generation and operational complexity. This paper presents an analytical framework for computing conflict rate and conflict probability in structured airspace using stochastic flow models. Traffic streams are modeled as renewal processes with prescribed inter-arrival time distributions, while interactions between flows are captured through geometry-dependent minimum spacing constraints at merges and crossings. Within this formulation, closed-form upper bounds on the expected conflict rate and conflict probability per aircraft are derived as functions of flow configuration and demand. These metrics are interpreted as complementary measures of airspace complexity, reflecting controller workload and per-aircraft operational risk. The methodology is applied to representative hexagonal cell geometries with varying routing structures and flow distributions. Results reveal non-monotonic tradeoffs between routing flexibility, capacity, and conflict generation, with intermediate flow configurations outperforming both highly constrained and highly distributed cases. The proposed framework provides a tractable tool for evaluating airspace design alternatives and complexity-informed traffic management strategies.

16.
arXiv (CS.LG) 2026-06-15

PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation

arXiv:2508.18166v5 Announce Type: replace-cross Abstract: Modern app store recommender systems struggle with multiple-category apps, as traditional taxonomies fail to capture overlapping semantics, leading to suboptimal personalization. We propose PCR-CA (Parallel Codebook Representations with Contrastive Alignment), an end-to-end framework for improved CTR prediction. PCR-CA first extracts compact multimodal embeddings from app text, then introduces a Parallel Codebook VQ-AE module that learns discrete semantic representations across multiple codebooks in parallel – unlike hierarchical residual quantization (RQ-VAE). This design enables independent encoding of diverse aspects (e.g., gameplay, art style), better modeling multiple-category semantics. To bridge semantic and collaborative signals, we employ a contrastive alignment loss at both the user and item levels, enhancing representation learning for long-tail items. Additionally, a dual-attention fusion mechanism combines ID-based and semantic features to capture user interests, especially for long-tail apps. Experiments on a large-scale dataset show PCR-CA achieves a +0.76% AUC improvement over strong baselines, with +2.15% AUC gains for long-tail apps. Online A/B testing further validates our approach, showing a +10.52% lift in CTR and a +16.30% improvement in CVR, demonstrating PCR-CA's effectiveness in real-world deployment. The new framework has now been fully deployed on the Microsoft Store.

17.
arXiv (math.PR) 2026-06-24

Negative index, matchings, and nonnegative eigenvalues of tridiagonal stochastic matrices

arXiv:2606.21122v2 Announce Type: replace Abstract: We study negative eigenvalues of $n\times n$ stochastic matrices whose off-diagonal support is constrained by a sparse graph. The main tool is a matching-based inertia principle: if $G$ is bipartite with matching number $\mu(G)$, $S$ is a real symmetric matrix supported on $G$ with nonnegative diagonal entries and whose negative index (i.e. number of negative eigenvalues counted with their multiplicities) is denoted by $\nu_{-}(S) $, then \[ \nu_{-}(S)\leq \mu(G). \] In particular, every $n\times n$ nonnegative tridiagonal stochastic matrix $P$ satisfies $ \nu_{-}(P)\leq \left\lfloor \frac{n}{2}\right\rfloor. $ Consequently, after ordering the eigenvalues of $P$ in the decreasing order, we have $ \lambda_{\lceil n/2\rceil}(P)\geq0, \ and hence \ \lambda_2(P)\geq0, \mbox{ for } n\geq3. $ This gives an all-dimensional strengthening of the previously known $4\times4$ tridiagonal stochastic result. Next, we show that this tridiagonal bound is sharp in every dimension in both reducible and irreducible cases. Finally, we explore some possible extension and raise some open questions.

18.
arXiv (CS.CL) 2026-06-11

Toward Preference-aligned Large Language Models via Residual-based Model Steering

Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically require curated data and expensive optimization over billions of parameters, and eventually lead to persistent task-specific models. In this work, we introduce Preference alignment of Large Language Models via Residual Steering (PaLRS), a training-free method that exploits preference signals encoded in the residual streams of LLMs. From as few as one hundred preference pairs, PaLRS extracts lightweight, plug-and-play steering vectors that can be applied at inference time to push models toward preferred behaviors. We evaluate PaLRS on various small-to-medium-scale open-source LLMs, showing that PaLRS-aligned models achieve consistent gains on mathematical reasoning and code generation benchmarks while preserving baseline general-purpose performance. Moreover, when compared to models aligned with DPO and SimPO, they perform better with great time-savings. Our findings highlight that PaLRS offers an effective, much more efficient and flexible alternative to standard preference optimization pipelines, offering a training-free, plug-and-play mechanism for alignment with minimal data.

19.
arXiv (CS.CV) 2026-06-18

DreamReg: Belief-Driven World Model for 2D-3D Ultrasound Registration

Ultrasound (US) is widely used for surgical navigation, yet real-time registration between intraoperative 2D slices and preoperative 3D volumes remains challenging due to partial observability, speckle noise, and the action-dependent US acquisition. Existing methods are one-shot or short-horizon, making it hard for them to gather evidence over time or capture how surgeons adjust probe motion based on on-screen feedback. We propose DreamReg, a belief-driven world-model framework that formulates 2D-3D registration as belief updating over rigid transformations. DreamReg maintains a latent belief state that summarizes past observations and poses information, and continuously refines the transformation through learned dynamics as new slices arrive. During training, DreamReg is exposed to probe-motion trajectories that mimic clinical scanning behavior and learns to update its belief by conditioning pose refinement on the current US observation. During inference, DreamReg refines registration via internal imagination: it rolls out the learned world model to simulate candidate probe motions and their predicted observations, and integrates these imagined outcomes to converge to an accurate rigid transformation. Experiments on CAMUS and u-RegPro datasets demonstrate improved robustness and competitive registration accuracy for real-time guidance compared with state-of-the-art methods.

20.
arXiv (CS.CV) 2026-06-11

VLGA: Vision-Language-Geometry-Action Models for Autonomous Driving

Vision-language-action (VLA) models can describe scenes and reason about them in language, yet still struggle to ground their actions in the dense 3D world around them. Existing approaches either inject features from a frozen 3D foundation model without an objective that ensures the policy uses them, or constrain geometry with sparse box and map losses that provide no dense spatial signal. We introduce VLGA, the first vision-language-action model supervised to reconstruct the dense 3D world it drives through. VLGA introduces geometry as a fourth modality alongside vision, language, and action through a dedicated expert supervised by a per-pixel pointmap regression loss against LiDAR. Extensive experiments conducted on challenging nuScenes and Bench2Drive datasets for open-loop and closed-loop evaluations, respectively, show the superiority of VLGA over counterpart VLA methods. In particular, on open-loop nuScenes, VLGA sets a new state of the art among VLA methods without ego status, with the lowest L2 (0.50\,m average) and 3-second collision rate (0.18\%). On closed-loop Bench2Drive, VLGA attains the state-of-the-art driving score of 79.08, +0.71 over the strongest prior VLA, at comparable efficiency and comfort.

21.
arXiv (quant-ph) 2026-06-16

Phase controlled spectral topology, dynamic stability and sensitivity in Non-Hermitian Cavity Magnonics

arXiv:2606.16522v1 Announce Type: new Abstract: We theoretically investigate a non-Hermitian cavity-magnon platform in which coherent photonmagnon interactions and reservoir-mediated dissipative coupling interfere through a single externally tunable phase. We show that this interference phase provides a universal control parameter that continuously rotates the effective coupling between Hermitian and anti-Hermitian regimes, enabling dynamic transitions between level repulsion and level attraction without modifying intrinsic system parameters. The resulting phase-controlled non-Hermitian topology gives rise to exceptional points, linewidth engineering, and zero-damping conditions. Owing to the propagation-direction dependence of the dissipative interaction, the system further exhibits strong nonreciprocal transport and phase-tunable isolation arising from asymmetric hybridization of the cavity and magnon modes. Beyond its spectral and transport properties, we establish a direct connection between nonHermitian spectral topology and nonequilibrium population dynamics. The interference phase governs the stability of the hybrid modes, driving transitions between stable relaxation, critical slowing down near exceptional points, oscillatory energy exchange, and exponentially amplified dynamics. We further demonstrate that the same phase-controlled exceptional topology can be exploited for enhanced sensing, where the eigenvalue response exhibits the characteristic square-root scaling associated with exceptional-point physics. Our results provide a unified framework linking spectral topology, directional transport, dynamical stability, and sensing functionality through reservoirengineered interference in cavity magnonic systems.

22.
arXiv (CS.CL) 2026-06-18

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried using target languages. A cross-lingual gap is a drop in accuracy incurred when querying knowledge in a target language rather than the source language. Existing research focused on modeling or training failures leading to cross-lingual gaps. In this work, we take an alternative view to characterize the nature of cross-lingual error, and hypothesize that the variance of responses in the target language is a key cause of this gap. For the first time, we formalize the cross-lingual gap in terms of biased and unbiased errors. We empirically validate our hypothesis through multiple inference-time interventions that control variance and reduce the cross-lingual gap. We demonstrate a few test-time ensemble methods that reduce response variance, and thereby improve source-target transfer scores by up to 12 absolute points yielding relative gains of 8% to over 50% across various LLMs.

23.
arXiv (CS.CL) 2026-06-24

SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

Fine-tuned encoders deployed across heterogeneous NLP tasks face three compounding problems: mismatched inductive biases, class-imbalance corruption of feature statistics, and no mechanism to condition attention on external lexical knowledge. We introduce \surgellm, a unified transformer framework that addresses each with a dedicated lightweight module: a surgical feature gate (learned per-dimension sigmoid over curated lexical indicators and \texttt{[CLS]}; provably degenerates to identity when features are uninformative), task-conditioned prefix tokens (quantized feature values and task identity prepended to every input), and Instance-Weighted Normalization (IWN; removes class-prior bias from gate statistics). We prove an excess-risk bound linking gate benefit to surgical feature alignment. Across four tasks, SST-2, multi-hop retrieval, LLM-prompt attribution, and authorship detection, covering 17,830 examples and eleven model variants over three seeds, the IWN variant achieves macro-F1 0.940 ($+0.036$ over the strongest non-IWN baseline; $+0.130$ on authorship detection). A random-vocabulary control ($-0.028$ avg.\ F1) confirms gains are lexical, not parametric. Code, vocabularies, and a $99.5\%$-recovery auto-extraction recipe are released.

24.
arXiv (CS.AI) 2026-06-24

Promise and challenges of heart chamber segmentation from non-contrast CT scans using contrastive unpaired image translation: a feasibility study

arXiv:2606.23879v1 Announce Type: cross Abstract: Purpose: To evaluate the feasibility and challenges of heart chamber segmentation from non-contrast CT scans using contrastive unpaired image translation and deep learning-based segmentation. Approach: We developed ChameleonNet, a framework utilizing the Contrastive Unpaired Translation (CUT) network with decoupled contrastive learning (DCL) loss to synthesize non-contrast CT from contrast CT scans. Using annotations of four heart chambers (left atrium (LA), left ventricle (LV), right atrium (RA), and right ventricle (RV)) from contrast scans, we trained a Hausdorff distance loss-enhanced nnU-Net on synthesized non-contrast images. The translation model was trained with 35,538 contrast-enhanced and 37,197 non-contrast CT slices. The segmentation model was trained with 292 synthesized non-contrast scans. Performance was evaluated using Dice similarity coefficient (DSC) and 95th Hausdorff distance (HD95) on 36 synthesized non-contrast scans, and volume agreement on 36 real non-contrast CT scans was assessed using Pearson correlation, mean absolute percentage error (MAPE), and mean percentage error (MPE). Results: The segmentation model achieved DSC of 0.94 (0.01), 0.91 (0.04), 0.92 (0.03), 0.93 (0.02), and HD95 of 3.63 (1.49), 5.74 (4.08), 5.18 (1.77), 5.51 (3.21) mm on synthesized non-contrast images for LA, LV, RA, and RV, respectively. On real non-contrast CT scans, Pearson correlations were 0.93, 0.82, 0.87, and 0.89 (all p

25.
medRxiv (Medicine) 2026-06-24

External Validation and Calibration Assessment of Explainable Machine Learning Models for GVHD Prediction After Allogeneic HSCT

Background Graft versus host disease (GVHD) remains a major determinant of morbidity and mortality following allogeneic hematopoietic stem cell transplantation (allo HSCT). Existing GVHD prediction models demonstrate modest discrimination and limited generalizability, and calibration drift across external populations is rarely characterized despite its essential role in the clinical interpretability of predicted probabilities. Objectives To develop and externally validate an explainable machine learning framework for predicting acute and chronic GVHD and associated overall survival in patients with acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), and myelodysplastic syndromes (MDS) undergoing allo HSCT, and to systematically characterize calibration across heterogeneous external validation cohorts to inform deployment requirements. Study Design The model was developed on three publicly available registry-derived datasets (N = 2,509) and externally validated across six independent cohorts (N = 14,788) comprising adult and pediatric allo HSCT recipients, including a regional Middle Eastern cohort (UAE and Jordan). A standardized preprocessing pipeline harmonized heterogeneous datasets. Gradient boosting models (CatBoost) were used for binary GVHD prediction; exploratory overall survival analysis used a Cox proportional hazards model with predicted acute GVHD risk as a covariate. Discrimination (AUROC with bootstrap 95% CI), calibration (logistic recalibration intercept and slope with analytical 95% CI), and feature importance (SHapley Additive exPlanations, SHAP) were assessed in training out-of-fold and all external cohorts. Results In internal validation, AUROC was 0.63 (95% CI 0.61-0.65) for acute GVHD and 0.72 (95% CI 0.70-0.74) for chronic GVHD. External validation demonstrated AUROC ranges of 0.51-0.57 (acute) and 0.54-0.64 (chronic), with consistent performance across disease subgroups despite substantial heterogeneity in transplant practices and feature availability. In exploratory survival analysis, the acute-GVHD-informed Cox model achieved a training-cohort C-index of 0.679 (95% CI 0.658-0.697); external C-indices ranged from 0.47-0.53. Calibration analysis identified systematic external risk overestimation (negative calibration intercept in 10 of 11 evaluable external cohort-target combinations) with heterogeneous slope drift requiring cohort-specific recalibration. Key predictors included recipient age, graft source, conditioning intensity, GVHD prophylaxis, and HLA match ratio. Conclusions An explainable, externally validated GVHD prediction framework was developed using heterogeneous registry-derived datasets, with systematic characterization of calibration drift across multiple external cohorts, an analysis rarely reported in prior GVHD prediction literature. Predictive performance was modest for acute GVHD and moderate for chronic GVHD, constrained by missing immunobiological variables and incomplete HLA characterization. Per-cohort recalibration is required before clinical deployment, with prospective validation and benchmarking against established GVHD risk scores identified as priority next steps.