Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

LLM-Based Synthetic Ground Truth Generation for Audio-Based Emotion Classification via In-Context Learning

arXiv:2606.14784v1 Announce Type: cross Abstract: Understanding human states and interaction dynamics is a core goal of human-computer interaction (HCI). As interaction paradigms become more immersive, virtual reality (VR) has emerged as a powerful platform for studying collaborative work. In such settings, evaluating team collaboration states, including team performance and team resilience, requires continuous and reliable inference of latent team-level cognitive and affective states from multi-modal sensor data, such as speech signals. However, generating ground truth labels for these latent states remains challenging due to sensor-induced noise, contextual variability, and sparse expert annotations. Traditional self-reporting approaches provide only static and delayed measurements and are therefore insufficient for capturing dynamic team processes reflected in continuous speech data. In this work, we propose a large language model (LLM)-driven, agentic inference workflow for automated emotion-related synthetic ground truth generation from streaming speech data in multi-user VR environments. Leveraging the generalization capabilities of LLMs, we use In-Context Learning (ICL) with few-shot demonstrations of paired audio-based samples and their corresponding transcriptions. ICL tends to achieve task adaptation comparable to model fine-tuning while circumventing the computational overhead of parameter updates. To construct informative and robust in-context prompts, we adopt a retrieval-based selection strategy that dynamically identifies relevant audio demonstrations based on similarity in the acoustic feature space.

02.
arXiv (quant-ph) 2026-06-16

Quantum Measurement and Continuous Markov Processes

arXiv:2606.15958v1 Announce Type: new Abstract: These are the lecture notes for a course on diffusive quantum measuring instruments. They were prepared and delivered at the Perimeter Institute on Mondays and Thursdays, from 2:30 to 4:00 PM, beginning October 27th, 2025 and ending December 11th, 2025. These lectures were recorded and can be found at https://pirsa.org/c25038.

03.
arXiv (CS.CL) 2026-06-16

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code and implementation details are released at https://github.com/XMUDeepLIT/SAAS.

04.
arXiv (CS.LG) 2026-06-11

On the Stability of Growth in Structural Plasticity

arXiv:2605.15435v2 Announce Type: replace Abstract: Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.

05.
arXiv (quant-ph) 2026-06-19

Quantifying Imaginarity in Neutrino Systems

arXiv:2412.01871v2 Announce Type: replace-cross Abstract: It is a fundamental question why quantum mechanics employs complex numbers rather than solely real numbers. In this work, we conduct the first analysis of imaginarity quantification in neutrino flavor and spin-flavor oscillations. As quantum systems in coherent superposition, neutrinos are ideal candidates for quantifying imaginarity within the resource theoretic framework, using measures such as the $\ell_1$-norm and the relative entropy of imaginarity. We show that in the case of two-flavor mixing, these measures of imaginarity are nonzero. The measures of imaginarity reach their extreme values when the probabilistic features of quantum theory are fully maximized, i.e., both the transitional and survival probabilities are approximately equal. Our study reveals that the imaginarity, as a resource, can be harnessed not solely from the presence of a complex phase in the mixing matrix but also from the intrinsic quantum dynamics of time evolution itself. We further extend our analysis to explore the dynamics of three-flavor neutrino mixing, incorporating the effects of a nonzero $CP$ phase.

06.
arXiv (CS.CV) 2026-06-11

MedCTA: A Benchmark for Clinical Tool Agents

To make clinically grounded decisions, medical AI agents are expected to go beyond simple recognition and be capable of tool retrieval, evidence acquisition, and integration. Existing benchmarks largely evaluate isolated perception or single-turn question answering, and therefore provide limited visibility into failures of planning, tool recruitment, and rollout reliability. We introduce MedCTA, a benchmark for evaluating medical tool agents on clinician-validated, step-implicit tasks grounded in realistic multimodal clinical inputs, including radiology images, pathology slides, and reports. MedCTA comprises 107 real-world clinical tasks with clinician-verified executable trajectories over 5 deployed tools, and supports process-aware evaluation of tool selection, argument validity, execution stability, trajectory fidelity, and outcome quality. We benchmark 18 open- and closed-source multimodal models and find that even frontier systems remain brittle in multi-step clinical tool use: autonomous rollouts are dominated by protocol failures, premature stopping, and incorrect tool recruitment, while gold-standard tool routing yields large but still incomplete gains. These results show that strong backbone perception does not translate into reliable agentic behavior in clinical settings. MedCTA provides a rigorous testbed for auditing, diagnosing, and advancing trustworthy medical AI agents. The dataset and evaluation suite are available at https://ivul-kaust.github.io/MedCTA/

07.
arXiv (quant-ph) 2026-06-19

Faking entanglement with imperceptible measurement deviations

arXiv:2606.20396v1 Announce Type: new Abstract: Quantum entanglement is a central resource underpinning emerging quantum technologies, enabling capabilities beyond those of classical systems. Accurate verification of entanglement is therefore crucial. However, experimental schemes usually rely on the assumption that quantum measurements can be realized exactly. As the complexity of a quantum system grows, this assumption typically becomes increasingly unrealistic, therefore leading to a widening mismatch between theoretical models and experimental implementations. Here we demonstrate that arbitrarily small measurement errors, when adversarially encoded in the measurement apparatus, can lead to the false certification of high-dimensional entanglement in systems that are, in fact, separable. This is achieved by introducing explicit hacking attacks to measurement devices in well-established entanglement verification tests. We further experimentally demonstrate this effect using classical photonic states encoded in the spatial degree of freedom, spanning up to 61 dimensions with measurement fidelity errors as low as 0.23%. Our results uncover a fundamental vulnerability in current methods for high-dimensional entanglement detection, highlighting the susceptibility of complex quantum devices to small adversarial perturbations. The findings underscore the need for developing secure verification of quantum information that is robust to bounded discrepancies between theory and experiment.

08.
arXiv (CS.CV) 2026-06-16

Variational Network with Wavelet-based UNET in Accelerated MRI Reconstruction from Under Sampled K-space Data

Fully sampled MRI requires dense k-space acquisition, leading to long scan times, reduced clinical throughput, and increased sensitivity to patient motion. Accelerated MRI addresses this by acquiring undersampled k-space data and reconstructing the missing information computationally. However, reconstruction from undersampled measurements is highly ill-posed and can introduce aliasing artifacts, noise amplification, and loss of anatomical detail. Although conventional parallel imaging and compressed sensing methods mitigate these issues, and deep learning methods have further improved reconstruction quality, preserving high-frequency structures under aggressive undersampling remains challenging. In this work, we propose a Variational Network with a Wavelet-based U-Net (W-UNet) for accelerated MRI reconstruction. The framework combines physics-guided iterative reconstruction with learnable multi-scale frequency representations. Standard pooling operations are replaced with Discrete Wavelet Transform and Inverse Wavelet Transform modules, enabling lossless downsampling while preserving low-frequency structure and high-frequency edge details. Integrated into the refinement and sensitivity map estimation stages, the proposed design improves artifact suppression, feature preservation, and reconstruction fidelity in both single-coil and multi-coil settings. Experiments on fastMRI knee and M4Raw brain datasets show state-of-the-art performance. Ablation studies further confirm the effectiveness of wavelet-based feature decomposition for accelerated MRI reconstruction.

09.
arXiv (CS.CL) 2026-06-16

JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory

Standard LLM evaluation practices compress diverse abilities into single scores, obscuring their inherently multidimensional nature. We present JE-IRT, a geometric item-response framework that embeds both LLMs and questions in a shared space. For question embeddings, the direction encodes semantics and the norm encodes difficulty, while correctness on each question is determined by the geometric interaction between the model and question embeddings. This geometry replaces a global ranking of LLMs with topical specialization and enables smooth variation across related questions. Building on this framework, our experimental results reveal that out-of-distribution behavior can be explained through directional alignment, and that larger norms consistently indicate harder questions. Moreover, JE-IRT naturally supports generalization: once the space is learned, new LLMs are added by fitting a single embedding. The learned space further reveals an LLM-internal taxonomy that only partially aligns with human-defined subject categories. We also show that simple linear probes of the embedding space recover cross-subject ability directions, such as an arithmetic axis that highlights quantitatively demanding questions in seemingly distant subjects like virology and global facts. JE-IRT thus establishes a unified and interpretable geometric lens that connects LLM abilities with the structure of questions, offering a distinctive perspective on model evaluation and generalization.

10.
arXiv (quant-ph) 2026-06-24

Toward fault-tolerant quantum computation exploiting quantum spatial distribution and gauge symmetry

作者:

arXiv:2604.25747v5 Announce Type: replace Abstract: We explore how the integrated use of quantum spatial distribution (QSD), or more specifically, a superposition of both spin and position states of particles, and gauge symmetry (GS) within Poulin's stabilizer formalism enhances quantum error correction. The study employs $3+2$ particles on nested squares proposed in the companion paper (arXiv:2504.07941), where three of them encode Shor's nine-qubit code and the remaining two detect errors in this code through their spin state measurements. The first result is that the GS offers resilience against three types of noise acting on a particle: arbitrary decoherence of its spin or position state, and dephasing of both states, which completely or partly destroys its QSD. To show that, we formulate a noise model unifying the above noise sources and prove the correctability of this unified model under our error-correcting scheme. The second result is that the QSD provides architectural flexibility, allowing us to stack the error-correcting systems both vertically and horizontally. Indeed, we present implementations of the error detection (stabilizer measurement), logical Hadamard and Toffoli gates, and a quantum adder with the required interactions only between nearest-neighbor and next-nearest-neighbor particles. Here, our treatment of the dynamics of particles, each having spin and position degrees of freedom, under nontrivial noise and gate operations indicates that the stabilizer formalism is a powerful tool for describing quantum many-body dynamics.

11.
arXiv (CS.CL) 2026-06-19

Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring

LLMs can personalize education, although current static-prompt tutoring systems struggle to adapt to diverse academic disciplines. We develop and test a system with subject-aware prompting, based on 14 pedagogical features (e.g., tutor scaffolding, student understanding) extracted from raw transcripts. We first train a prompt routing model in a simulation environment, and then deploy it for online adaptation with actual high-school students. The simulation benchmark shows the router outperforming two static baselines ($0.694$ vs. $0.647$ and $0.64$, $p

12.
arXiv (quant-ph) 2026-06-16

Generative modelling powered by room-temperature polariton condensates

arXiv:2606.15344v1 Announce Type: cross Abstract: Generative modelling requires efficient stochastic nonlinear transformations and physical platforms that can naturally realise them. We experimentally demonstrate that nonlinear optical systems operating in the strong light-matter coupling regime can serve as physical transformation layers for conditional generative modelling. Specifically, we develop a workflow in which room-temperature exciton-polariton condensates formed in organic dye microcavities act as a physical stochastic transform within a generative adversarial network and enable conditional digit-to-image translation. By using the nonlinear many-body dynamics and intrinsic stochasticity of polariton condensates, the workflow outperforms baseline approaches based on digitally injected perturbations. We find that polariton-enabled sampling via generative adversarial network (Polariton GAN) yields improved inception score, digit preservation accuracy and structural similarity compared with both digital sampling and laser-based systems. We further show that spatially correlated output variations can naturally regularise adversarial training and enhance output diversity. Our results establish polariton condensation as a new computational resource for generative modelling, opening a pathway towards physics-enhanced machine learning systems.

13.
arXiv (CS.LG) 2026-06-17

Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness

arXiv:2603.20775v2 Announce Type: replace Abstract: In personalized marketing, uplift models estimate the incremental effect of an intervention by modeling how customer behavior would change under alternative treatments using counterfactual analysis. However, real-world marketing data often exhibit various biases, such as selection bias, spillover effects, measurement error, and unobserved confounding. These biases can adversely affect both the accuracy of uplift estimation and the validity of evaluation metrics. Despite the importance of bias-aware assessment, there remains a lack of systematic studies evaluating how different models and metrics perform under such biased conditions. To bridge this gap, we design a systematic benchmarking framework. Unlike standard predictive tasks, real-world uplift datasets inherently lack counterfactual ground truth. This limitation renders the direct validation of evaluation metrics infeasible and prevents the precise quantification of biases. Therefore, a semi-synthetic approach serves as a critical enabler for systematic benchmarking. This approach effectively bridges the gap by retaining real-world feature dependencies while providing the ground truth needed to isolate structural biases. Our investigations reveal that (i) uplift targeting and prediction can manifest as distinct objectives, where proficiency in one does not ensure efficacy in the other; (ii) while many models exhibit inconsistent performance under diverse biases, TARNet shows notable robustness, providing insights for subsequent model design; (iii) the stability of evaluation metrics is linked to their mathematical alignment with the ATE, suggesting that ATE-approximating metrics yield more consistent model rankings under structural data imperfections. These findings suggest the need for more robust uplift models and evaluation metrics under real-world data imperfections.

14.
arXiv (CS.AI) 2026-06-16

Learning aligned EEG representations with subject-specific encoders

arXiv:2606.16462v1 Announce Type: cross Abstract: Cross-subject EEG decoding promises more training data, but it also exposes neural networks to strong inter-subject distribution shifts. We study whether task supervision and architecture alone can learn subject-aligned representations. We replace a shared EEG encoder with subject-specific encoders followed by a common classifier, and compare this hybrid model with standard EEGNet, AttentionBaseNet, and CTNet baselines with Euclidean Alignment (EA) on four motor-imagery datasets. EA improves shared encoders by recentering subject covariances, but the hybrid encoder largely internalises this role: validation-loss curves and latent-distance analyses change little when EA is removed. Subject-specific heads increase class distinctiveness and place each subject close to its own latent manifold, improving most subjects while leaving a method-sensitive subset. These results support subject-specific encoders as a learned alignment mechanism for EEG decoding and identify head selection for unseen subjects as the remaining bottleneck.

15.
arXiv (CS.AI) 2026-06-24

It's Complicated: On the Design and Evaluation of AI-Powered AAC Interfaces

arXiv:2606.24854v1 Announce Type: cross Abstract: Artificial intelligence (AI) can enhance what people who use augmentative and alternative communication (AAC) are able to do with their systems. However, evaluating AI-powered AAC interfaces can be difficult. People are intersectional beings and current evaluation metrics can struggle to capture the multifaceted and nuanced desires people may have for their AAC. We explore the complicated nature of six AAC problem spaces, explore how AI might be used in these spaces, and suggest more robust methods of evaluation that take the intersectional nuances of people into account. We also discuss broader issues that arise across these problem spaces and how they could be addressed using our proposed evaluation methods.

16.
arXiv (CS.CL) 2026-06-24

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

作者:

We introduce Age of LLM, a turn-based 1v1 benchmark in which two LLMs face off on a 13x7 grid to destroy the enemy base. Three stressors are deliberate: fog of war, full diplomacy (messages, ceasefires, ultimatums; uranium kept secret), and a reliability dimension where every turn must follow a strict JSON schema and an illegal action is silently discarded. The engine is private and each match uses a fresh random map seed and opponent, mitigating the data contamination that affects public benchmarks. Models receive a (near) rule-only prompt with no build-order advice (two tactical seed phrases were present during data collection; see Section 2.7). We benchmark 15 reasoning models across 54 matches and 5,258 actions. Findings: (1) the nuclear rush dominates (78% on the rules-coherent v0.11+ sub-corpus; 85% corpus-wide) with a sole-launcher signature that is largely mechanical under secret-simultaneous launch rules, not a cognitive deterrence failure; (2) military conquest is rare but faster (12.3 vs 18.9 turns); (3) diplomacy is prolific yet almost never consummated; (4) ~58% of illegal actions are fog/state errors, making the illegal-action rate a measure of belief-tracking; (5) – the least established, and the only one we label exploratory – a weak link associates reliability with winning. The corpus is small, unbalanced and not side-swapped, so the ranking is a preliminary descriptive view, not a contribution. Beyond ranking, the turn-by-turn traces of actions and messages make the corpus a lens on how LLMs reason under adversarial uncertainty – their belief-tracking, spontaneous deception, and per-model cognitive "personas" – which we frame as a future research direction. We release the replay format, an isometric viewer and all replays; engine source on request.

18.
arXiv (quant-ph) 2026-06-17

Effects of Josephson Junction Non-idealities on Adiabatic Quantum Flux Parametron Circuits

arXiv:2606.17338v1 Announce Type: new Abstract: Adiabatic quantum flux parametron (AQFP) gate is a promising approach to scale up the cryogenic microwave electronics for superconducting qubit multiplexed control. However, the performance of these circuits depends on the quality of the Josephson junctions which are ideally superconductor-insulator-superconductor (SIS) type following the ideal sinusoidal relation between current and quantum phase. We demonstrate how the non-sinusoidal current-phase relation in Superconductor-Normal metal-Superconductor (SNS) and weak link (WL) junctions affects the speed, delay, and margin of the AQFP gates. The JJ models are defined in the Keysight ADS simulator using symbolically defined device (SDD) method.

19.
arXiv (CS.AI) 2026-06-24

Toward Self-Evolution-Ready Workflow Harnesses: A Reversible Migration Path and Convertibility Taxonomy for Expert LLM Pipelines

arXiv:2606.24598v1 Announce Type: cross Abstract: While expert-validated "LLM + script" workflows deliver significant value, they remain static: they encode hard-won domain knowledge yet fail to adapt execution based on feedback. Existing agent research predominantly targets greenfield agents and synthetic benchmarks, leaving the migration of active legacy workflows unresolved. To bridge this gap, we present a reversible, Strangler-Fig migration path that refactors legacy workflows into composable, typed, and auditable stages. Central to this framework is a three-tier convertibility taxonomy (A/B/C), implemented as a routing stage within the system harness, which diagnoses a workflow's readiness and routes it accordingly.

20.
medRxiv (Medicine) 2026-06-24

Breaking The Pain-Stiffness Cycle- Supraclavicular Catheter Facilitated Rehabilitation Of Post-Surgical Elbow stiffness- A Retrospective Observational Study

ABSTRACT Background: Post-traumatic elbow stiffness is a recognised complication following orthopaedic trauma surgery, occurring in 10-15% of trauma patients sustaining injuries. Pain remains the primary barrier to physiotherapy compliance, with surgical arthrolysis carrying recurrence rates of up to 34%. The supraclavicular brachial plexus block, referred to as the 'spinal of the arm', provides anaesthesia and analgesia to the entire upper limb below the shoulder. A structured non-surgical approach combining continuous catheter analgesia with timed rehabilitation was identified as an unmet need in this patient group. Methods: A single-centre retrospective observational study was conducted on data of patients treated for post-surgical upper limb stiffness between January 2022 and April 2026. Of 30 patients identified, 28 with elbow involvement formed the primary analysis group following exclusion of 2 patients with isolated wrist stiffness and complex regional pain syndrome. Ultrasound- guided supraclavicular brachial plexus catheters were inserted using the Contiplex system. Patients received 0.5% Bupivacaine (10-15ml) for initial blockade, followed by daily top-up doses of 0.2% Ropivacaine(20ml) given 30 minutes prior to structured physiotherapy and CPM sessions for up to 5 days. The primary outcome was change in arc of elbow motion in degrees, measured by the attending orthopaedic consultant using standard goniometry. Results: Complete pre- and post- intervention data were available for all 28 patients. Mean pre-intervention arc of elbow motion was 39.1{degrees}(SD+/-23.2{degrees}), improving to 104.2{degrees}(SD+/- 30.0{degrees}) post-intervention. Mean improvement was 65.1{degrees}(SD+/- 30.6{degrees} ); 95% CI 53.8{degrees} to 76.4{degrees} ; range 10{degrees}-140{degrees} ; paired t-test t=-11.27, p

21.
arXiv (CS.CV) 2026-06-16

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Pixel-space diffusion models are trained on full-bandwidth noisy images, yet the useful signal available to the denoiser is strongly frequency dependent. Under rectified-flow diffusion and natural-image power-law spectra, the per-band data-to-noise contour $k^{*}(t) = (1-t)^{-2/\alpha}$ separates a signal-bearing low-frequency region from a noise-dominated high-frequency region at each time $t$. We show that this implicit coarse-to-fine structure is not merely descriptive: it induces a capacity-allocation problem. A standard pixel-space denoiser must discover the moving bandwidth boundary internally and can spend computation on frequency-time regions where the optimal prediction collapses to deterministic baselines rather than data-distribution modeling. To make this boundary explicit, we introduce Spectral Forcing, a parameter-free, time-conditional 2D-DCT low-pass operator applied to the noisy input before the patch embedder. Its cutoff expands monotonically with the diffusion time and becomes the identity at the data endpoint. Through controlled synthetic experiments, we identify the regime in which the operator is beneficial: coarse patch tokenization and data whose high-frequency content is predominantly noise rather than essential signal. On ImageNet-256 with JiT-700M/32, Spectral Forcing consistently improves both FID and Inception Score across different training epochs, demonstrating robust gains throughout training; at finer tokenization, the spectral forcing is still competitive. We further insert the unchanged operator into SenseNova-U1, a unified text-to-image model, where it improves DPG-Bench and GenEval, showing that the input-side spectral prior transfers beyond class-conditional generation. These results suggest a route to capacity-efficient pixel-space diffusion by showing the signal and hiding the noise.

22.
arXiv (CS.CV) 2026-06-17

Fluently Lying: Adversarial Robustness Can Be Substrate-Dependent

The primary tools used to monitor and defend object detectors under adversarial attack assume that when accuracy degrades, detection count drops in tandem. This coupling was assumed, not measured. We report a counterexample observed on a single model: under standard PGD, EMS-YOLO, a spiking neural network (SNN) object detector, retains more than 70% of its detections while mAP collapses from 0.528 to 0.042. We term this count-preserving accuracy collapse Quality Corruption (QC), to distinguish it from the suppression that dominates untargeted evaluation. Across four SNN architectures and two threat models (l-infinity and l-2), QC appears only in one of the four detectors tested (EMS-YOLO). On this model, all five standard defense components fail to detect or mitigate QC, suggesting the defense ecosystem may rely on a shared assumption calibrated on a single substrate. These results provide, to our knowledge, the first evidence that adversarial failure modes can be substrate-dependent.

23.
medRxiv (Medicine) 2026-06-23

Clinical Characteristics and Predictors of Delayed Cerebral Ischemia in High-Altitude Aneurysmal Subarachnoid Hemorrhage

Background and Purpose-Aneurysmal subarachnoid hemorrhage (aSAH) remains a devastating cerebrovascular event, with delayed cerebral ischemia (DCI) representing its most feared complication. High-altitude environments induce profound cerebrovascular adaptations, yet no study has systematically examined aSAH outcomes in chronically hypoxic populations. We characterized clinical features and identified DCI predictors among aSAH patients on the Tibetan Plateau. Methods-This single-center retrospective cohort included 256 consecutive aSAH patients admitted at a tertiary neurosurgical center in Tibet (altitude 2,330-4,920 m) between 2013 and 2015. The primary outcome was DCI per consensus criteria. Multivariable logistic regression identified independent predictors; receiver operating characteristic analysis evaluated model performance. Altitude and hemoglobin were specifically evaluated as altitude-related risk factors. Results-DCI occurred in 26 patients (10.2%). In-hospital mortality was 1.6%. Most patients presented with good-grade aSAH (Hunt-Hess I-II, 73.0%; Fisher I-II, 73.1%). On multivariable analysis, only Fisher grade independently predicted DCI (odds ratio, 3.63 [95% CI, 1.14-11.52]; P=0.029). Neither altitude (P=0.697) nor hemoglobin concentration (P=0.858) was associated with DCI risk. The predictive model achieved an area under the curve of 0.812. At 1-year follow-up, 77.8% achieved favorable functional outcomes (modified Rankin Scale 0-2). Conclusions-Fisher grade is the sole independent predictor of DCI in high-altitude aSAH patients, while chronic hypoxia and compensatory hemoglobin elevation do not significantly modify DCI risk. Established sea-level prognostic frameworks remain valid in high-altitude settings, supporting their continued use for clinical risk stratification. Keywords: aneurysmal subarachnoid hemorrhage; high altitude; delayed cerebral ischemia; Fisher grade; Tibetan Plateau; prognosis

24.
medRxiv (Medicine) 2026-06-18

Chest X-Ray as a critical screening tool for Household Contacts of TB: Lessons from Three Years of Programmatic Data in India

Introduction: Household contacts (HHCs) of pulmonary TB patients remain at high risk for TB infection and disease progression, yet many remain asymptomatic and are missed by symptom-screening pathways. While India expanded its TB preventative guidelines to include all HHCs in 2021, chest X-ray (CXR) screening continues to be used selectively, representing a missed opportunity in early case detection. Methods: The analysis uses programmatic data from Project JEET 2.0 (Joint Effort for Elimination of Tuberculosis), implemented by the William J. Clinton Foundation in India, between October 2021 and March 2024. Eligible HHCs (>=5 years) were offered CXR screening as part of TB preventive therapy (TPT) evaluation. Descriptive and multivariable analyses examined predictors of CXR uptake and TB yield. A two-stage logistic regression model estimated potential TB yield under universal CXR coverage. Model performance was evaluated using the area under the curve (AUC), and bootstrap simulations generated counterfactual estimates of missed TB cases. Results: Among 1,034,621 HHCs, 1.02% individuals were found positive for TB, which includes 7,786 HHCs who were on TB treatment already, while an additional 2,812 were identified during pre-TPT evaluation. Among eligible HHCs (n = 1,026,835), 70% were screened with CXR, of which 2.4% had suggestive TB findings. Of these, 79% went for further TB assessment. Symptomatic HHCs were more likely to be CXR screened (84% vs 69%) and assessed for TB, yet two-thirds of all detected TB cases were asymptomatic. It is estimated that universal CXR coverage and TB testing for suggestive cases can increase TB detection by at least 87%. Conclusion: The study provides a scalable approach to expand CXR coverage through public-private partnerships, enabling early TB detection among HHCs, especially among asymptomatic contacts. Future implementations will benefit from integrating AI-enabled reading, along with systematic follow up for those with suggestive findings.

25.
medRxiv (Medicine) 2026-06-24

External Validation and Calibration Assessment of Explainable Machine Learning Models for GVHD Prediction After Allogeneic HSCT

Background Graft versus host disease (GVHD) remains a major determinant of morbidity and mortality following allogeneic hematopoietic stem cell transplantation (allo HSCT). Existing GVHD prediction models demonstrate modest discrimination and limited generalizability, and calibration drift across external populations is rarely characterized despite its essential role in the clinical interpretability of predicted probabilities. Objectives To develop and externally validate an explainable machine learning framework for predicting acute and chronic GVHD and associated overall survival in patients with acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), and myelodysplastic syndromes (MDS) undergoing allo HSCT, and to systematically characterize calibration across heterogeneous external validation cohorts to inform deployment requirements. Study Design The model was developed on three publicly available registry-derived datasets (N = 2,509) and externally validated across six independent cohorts (N = 14,788) comprising adult and pediatric allo HSCT recipients, including a regional Middle Eastern cohort (UAE and Jordan). A standardized preprocessing pipeline harmonized heterogeneous datasets. Gradient boosting models (CatBoost) were used for binary GVHD prediction; exploratory overall survival analysis used a Cox proportional hazards model with predicted acute GVHD risk as a covariate. Discrimination (AUROC with bootstrap 95% CI), calibration (logistic recalibration intercept and slope with analytical 95% CI), and feature importance (SHapley Additive exPlanations, SHAP) were assessed in training out-of-fold and all external cohorts. Results In internal validation, AUROC was 0.63 (95% CI 0.61-0.65) for acute GVHD and 0.72 (95% CI 0.70-0.74) for chronic GVHD. External validation demonstrated AUROC ranges of 0.51-0.57 (acute) and 0.54-0.64 (chronic), with consistent performance across disease subgroups despite substantial heterogeneity in transplant practices and feature availability. In exploratory survival analysis, the acute-GVHD-informed Cox model achieved a training-cohort C-index of 0.679 (95% CI 0.658-0.697); external C-indices ranged from 0.47-0.53. Calibration analysis identified systematic external risk overestimation (negative calibration intercept in 10 of 11 evaluable external cohort-target combinations) with heterogeneous slope drift requiring cohort-specific recalibration. Key predictors included recipient age, graft source, conditioning intensity, GVHD prophylaxis, and HLA match ratio. Conclusions An explainable, externally validated GVHD prediction framework was developed using heterogeneous registry-derived datasets, with systematic characterization of calibration drift across multiple external cohorts, an analysis rarely reported in prior GVHD prediction literature. Predictive performance was modest for acute GVHD and moderate for chronic GVHD, constrained by missing immunobiological variables and incomplete HLA characterization. Per-cohort recalibration is required before clinical deployment, with prospective validation and benchmarking against established GVHD risk scores identified as priority next steps.