Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-18

Semantic Robustness Certification for Vision-Language Models

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

02.
arXiv (CS.AI) 2026-06-24

Governed Shared Memory for Multi-Agent LLM Systems

arXiv:2606.24535v1 Announce Type: new Abstract: Multi-agent LLM environments require robust mechanisms for shared knowledge management. This paper formalizes the fleet-memory problem and identifies four foundational failure modes: unauthorized leakage, stale propagation, contradiction persistence, and provenance collapse. To address these, we define explicit systems-level primitives: scoped retrieval, temporal supersession, provenance tracking, and policy-governed memory propagation. These primitives are implemented in MemClaw, a production multi-tenant memory service, and evaluated via ArgusFleet, a reproducible harness testing four governance dimensions. Rather than a baseline comparison, this study measures a live production service, emphasizing real-world architectural insights and negative results. Key Evaluation Results Provenance: Successfully reconstructed 100% of depth-four derivation chains with correct writer identity at sub-second per-hop latency. Propagation: Demonstrated high intra-fleet visibility with zero cross-fleet leakage. Under strong write mode, write-to-visible latency was optimized to a single search round-trip. Production Architectural Issues Discovered Asymmetric Scope Enforcement: Tenant isolation held, but sub-tenant scope was initially bypassed on direct GET-by-id requests for agent-scoped credentials (disclosed and remediated during the study). Pipeline Ordering Conflict: While contradiction supersession works for admitted writes, a synchronous near-duplicate gate can prematurely reject contradictory writes before the asynchronous contradiction detector can evaluate them. Conclusion: Long-context retrieval alone is insufficient for production multi-agent memory. Governed shared memory demands explicit systems-level abstractions, and live evaluation is vital to expose enforcement and pipeline-ordering failures missed by design-only treatments.

03.
arXiv (CS.CL) 2026-06-11

On the Optimal Reasoning Length for RL-Trained Language Models

Reinforcement learning substantially improves reasoning in large language models, but it also tends to lengthen chain-of-thought outputs and increase computational cost. Although length-control methods have been proposed, the length-accuracy relationship they induce remains unclear. We train policies with several length-control methods on multiple base models in a controlled setup and find that, across both mathematical reasoning and code generation, accuracy is non-monotonic in output length, peaking at an intermediate value. Mode accuracy, however, continues to improve with length even in settings where sample accuracy plateaus or declines, indicating that the non-monotonic length-accuracy relationship is driven by dispersion around an increasingly correct center.

04.
medRxiv (Medicine) 2026-06-24

An Automated, Pathologist-free Gleason Grade Stratifies Disease-free Interval Comparably to Expert Grading from a Single Out-of-distribution Slide

Automated Gleason grading now matches expert pathologists on the cohorts where systems are developed and tuned, but deployment-relevant gaps remain: whether an automated grade, applied without site-specific tuning or pathologist oversight, stratifies outcome comparably to expert grading on slides from unseen institutions and in cross-specimen applications. We tested this for disease-free interval (DFI), a curated recurrence endpoint. A production gland-level prostate diagnostic (PathTools Prostate v11.0) was applied frozen and uncalibrated to 298 diagnostic whole-slide images from 274 TCGA-PRAD radical-prostatectomy patients, a cohort outside its development distribution and needle-core-biopsy training data, contributed by 25 source sites under heterogeneous digitization; tissue was detected automatically with no expert region annotation. From the output we derived an ISUP grade group and continuous high-grade content, and evaluated each grade as a standalone predictor of DFI (24 events) by Harrell's c-index with 95% bootstrap confidence intervals, a paired between-method bootstrap, and Kaplan-Meier curves with the log-rank test. The automated grade reproduced the clinical grade group at quadratic-weighted kappa = 0.62 (95% CI 0.53-0.70; 48% exact, 86% within one group), within the expert inter-observer range. As the sole predictor it stratified recurrence (log-rank p = 0.022; c-index 0.69, 95% CI 0.58-0.79), and the continuous high-grade fraction was robustly prognostic (hazard ratio 1.37 per SD, p = 0.029; c-index 0.71, 0.61-0.81). Standalone discrimination was not statistically separable from the clinical grade (c-index 0.78, 0.69-0.86; paired {triangleup} c-index spanning zero), and in a joint model the automated grade added nothing beyond it, consistent with both measuring a shared morphological axis. From a single out-of-distribution slide with no pathologist oversight, the automated grade provides standalone recurrence stratification not statistically separable from whole-gland expert grading, demonstrating robust generalizability beyond training data; reported as a continuous high-grade fraction, it offers reproducible, expert-free, grade-equivalent risk stratification for harmonizing large archival or genomically-profiled cohorts.

05.
arXiv (quant-ph) 2026-06-25

Variants of the Quantum Phase Operator for the Harmonic Oscillator

arXiv:2606.24913v1 Announce Type: new Abstract: We introduce and study quantum phase operators associated with the Quantum Harmonic Oscillator (QHO). We show that these operators are trace-class perturbations of the Susskind-Glogower operators and examine their mathematical and physical properties. The construction is motivated by the physically relevant two-phase case.

06.
arXiv (CS.LG) 2026-06-18

A Human-in-the-Loop Bayesian Optimization Framework for Constraint-Aware Bioprocess Development

arXiv:2606.19230v1 Announce Type: new Abstract: This work presents an extension to Pareto Front Guided Sampling (PFGS), a Human-in-the-Loop (HitL) Bayesian Optimization (BO) framework in which Gaussian process (GP) surrogate-derived quantities are reformulated as objectives of a multi-objective optimization problem, and the resulting Pareto front is exposed to a domain expert for interactive candidate selection rather than returning a single automated recommendation. The framework is extended in two directions: constrained optimization is addressed by incorporating the posterior probability of satisfying output specification limits as an explicit Pareto objective, computed analytically from the GP posterior distribution; robust optimization is addressed by a Monte Carlo sampling strategy that estimates expected lower-confidence performance over a user-defined variability of input perturbations, capturing performance degradation under likely implementation deviations. The resulting multi-dimensional Pareto representation renders trade-offs between predicted performance, model uncertainty, probabilistic constraint satisfaction, and input robustness simultaneously visible through pairwise two-dimensional projections on an interactive dashboard, enabling selection criteria to be iteratively refined as the surrogate model improves and development objectives evolve. The framework is showcased on an eight-dimensional fed-batch Chinese Hamster Ovary (CHO) cell culture simulator demonstrating systematic identification of high-performing, feasibility-compliant, and perturbation-resilient operating conditions, and illustrating how expert-defined requirements provide a principled stopping criterion and support informed allocation of experimental resources.

07.
arXiv (CS.LG) 2026-06-16

TCHG: Tri-Trust Conditioned Heterogeneous Graph Learning for Reliable Dynamic Trust Prediction

arXiv:2606.16611v1 Announce Type: new Abstract: Trust prediction infers latent user-user trust relations and provides important support for social recommendation, fake-review and manipulation detection, and risk identification. Graph neural networks have become a prominent approach to trust prediction because of their ability to learn network structures and complex trust dependencies. However, existing methods often rely on a unified representation of trust signals and do not disentangle heterogeneous trust evidence into separate evidence channels, failing to exploit the distinct roles that different evidence channels should play during trust modeling. To address this gap, this paper argues that trust evidence should not be treated as an undifferentiated input, but should be decomposed and used as functional control factors over graph propagation. We propose TCHG, a tri-trust conditioned heterogeneous graph learning framework that decomposes trust evidence into three channels and assigns them distinct functional roles in propagation: entity reliability governs message admission, interaction-behavior reliability modulates propagation strength, and contextual trust adjusts the propagation mode through context-conditioned operator selection. Since the three evidence channels evolve at different temporal scales, TCHG maintains independent temporal states with non-uniform decay rates to prevent rapidly changing contextual signals from overwriting slowly accumulated entity reliability. It further predicts trust probability and calibrates the output probability, improving predictive confidence under sparse or conflicting evidence. Extensive experiments on multiple public trust datasets show that TCHG achieves effective and reliable trust prediction compared with representative trust prediction and heterogeneous graph baselines.

08.
arXiv (CS.CL) 2026-06-19

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor out-of-domain (OOD) generalization due to its rigid alignment with static teacher trajectories. While reinforcement learning (RL) offers an alternative, the capacity limitations of SLMs pose a severe dilemma: sparse outcome rewards provide insufficient guidance, whereas strict trajectory matching imposes overly restrictive constraints. To bridge this capacity-driven gap, we propose MENTOR, which introduces a flexible yet process-aware reward structure. Instead of enforcing rigid replication, MENTOR uses the teacher's reference to guide tool-use behavior, balancing behavioral alignment with downstream performance. Extensive experiments on controlled executable-tool benchmarks demonstrate that MENTOR improves OOD tool-use performance compared to SFT and strict RL baselines. Our findings suggest that within verifiable tool-use environments, flexible tool-use alignment offers a more effective approach than strict trajectory replication for developing adaptable small models.

09.
bioRxiv (Bioinfo) 2026-06-17

Posterior-calibrated multimodal motor states reveal longitudinal and imaging-associated heterogeneity in Parkinson's disease

Parkinson's disease (PD) motor heterogeneity is commonly summarized by hard subtype labels, although clinical states vary longitudinally, severity can dominate unsupervised structure, and model uncertainty is rarely calibrated. We developed a posterior and refit-stability calibrated multimodal motor state framework that assigns probabilistic MDS-UPDRS-III motor states, aggregates them at the patient level, separates global burden from residual tremor-axial profile, and tests whether imaging can recover the resulting posterior distribution. In 29,366 aligned PPMI motor-posterior visits spanning 4,773 participant identifiers, patient-level state families were stable on average (modal-family fraction 0.925; 95% CI 0.921 - 0.930), but 25.5% of patients transitioned state over follow-up (95% CI 24.1 - 26.7%). PD-only cohort definitions produced smaller denominators and are reported as sensitivity cohorts with rerun calibration and imaging-posterior checks. Severity and covariates explained substantial motor-domain variance, especially bradykinesia (rsecond=0.850), but residual profile modeling retained five active components across total-severity, principal-component, leave-one-domain, non-target-burden, and clinical-only severity axes. Refit-stability calibration with 250 patient-blocked bootstrap refits showed high nominal posterior confidence (0.989) but lower empirical label consistency (0.849), quantifying overconfidence rather than hiding it. Patient-held-out temporal modeling predicted future axial burden (best XGBoost rsecond=0.605) and future state transition (XGBoost AUC=0.830; 95% CI 0.822 - 0.837). DaTSCAN plus FreeSurfer ROI features predicted patient-level soft motor posterior vectors (RF jsd=0.209; 95% CI 0.199 - 0.220; macro-AUROC=0.692), while severity/demographic-adjusted imaging features further improved soft posterior recovery (jsd=0.188). BioFIND transfer reproduced clinically meaningful endpoint gradients after state assignment in 225 external patients, supporting external face validity rather than definitive transportability. These results support PD motor phenotypic states as calibrated, dynamic, clinically interpretable profiles with convergent imaging associations, not as definitive biological subtypes.

10.
medRxiv (Medicine) 2026-06-18

Can Vision-Language Models See the Vital Signs? Benchmarking and Fine-Tuning for Intraoperative Monitor Reading

Background Vital-sign deterioration is a leading contributor to preventable perioperative death, yet manual monitor reading is intermittent, error-prone, and subject to alarm fatigue. Automating this perceptual step could enable continuous surveillance, but existing solutions depend on device-specific hardware integration or cloud-hosted vision-language models (VLMs), which raise privacy, cost, and connectivity barriers in resource-limited healthcare facilities. Methods We constructed a benchmark of 200 in-the-wild intraoperative monitor photographs (spanning multiple vendors, angles, and illumination conditions) annotated for eight vital-sign parameters: heart rate, SpO2, ETCO2, respiratory rate, systolic/diastolic/mean blood pressure, and temperature. We evaluated an optical character recognition (OCR)-based pipeline, nine instruction-tuned VLMs (four commercial, five open-weight ranging from [≤]4B to 31B parameters) under two prompting regimes, and a compact open model (Qwen3.5-9B) adapted via low-rank fine-tuning (LoRA, 0.46% of parameters updated). Results Under a domain-aware prompt, frontier VLMs reached 0.98-0.997 exact-match accuracy zero-shot, whereas the OCR pipeline and [≤]4B model scored approximately 0.20 lower, defining a 9B-class usable floor. LoRA fine-tuning Qwen3.5-9B on 80-120 images raised accuracy from 0.953 to 0.994 (statistically indistinguishable from the best commercial model) and reduced the critical-error rate fivefold (0.0313 [->] 0.0063). Ablations showed that performance saturated at 80 training images and rank-8 adapters. Conclusion Monitor reading is a solved perception problem for VLMs above the 9B scale. A lightweight fine-tuned open model achieves frontier accuracy while running entirely on local hardware, preserving data privacy, offline capability, and near-zero marginal cost. Residual errors stem from blood-pressure source ambiguity and are addressable with explicit disambiguation logic.

11.
arXiv (CS.CV) 2026-06-18

Reasoning as Intersection: Consensus-Frame Alignment for Visual Focus in Video-MLLMs

Reinforcement learning has improved the reasoning ability of large language models, but applying outcome-only rewards to video multimodal large language models (Video-MLLMs) provides limited guidance on which visual evidence should support the answer. Inspired by multisensory integration, where consistent cues can enhance the salience and reliability of perceptual estimates, we introduce Consensus Frame GRPO (CF-GRPO), a temporal-annotation-free process-level reward framework for evidence-aware video reasoning. CF-GRPO constructs a consensus frame prior from intrinsic video cues, including temporal coverage, scene-transition cues, and query-conditioned visual relevance. It then computes a model-side frame-use score from visual and response representations and optimizes their agreement through the Consensus Frame Reward (CFR). With salience-aware sparse aggregation and distribution sharpening, CFR provides a high-contrast reward signal without requiring human temporal annotations. Experiments show that VideoCFR achieves competitive performance across complex video reasoning benchmarks and improves several metrics over representative Video-MLLM and RL baselines, while the consensus prior provides an interpretable view of the evidence frames emphasized during training. The implementation is available at https://github.com/1Pansy/VideoCFR.

12.
arXiv (CS.AI) 2026-06-16

VGPT-RSI for RH-Adjacent Formal Progress: Boundary Certificates, Verified Finite Lagarias Inequalities, and Explicit Failure Localization

arXiv:2606.15096v1 Announce Type: new Abstract: The Riemann Hypothesis remains one of the central unsolved problems in mathematics. Rather than claiming proof, we investigate whether a verifiable AI-assisted reasoning system can produce reliable, formally checked partial progress while explicitly identifying the remaining mathematical obstructions. We apply the Verifiable Growing Physical Transformer with Recursive Self-Improvement (VGPT-RSI) to two RH-adjacent certification tasks. First, we construct and verify a finite RH-boundary certificate for inequality on a parameterized safe lower curve over a region. The numerical boundary curve is converted into a certificate-backed lower curve, audited using outward-rounded interval arithmetic and Arb/FLINT ball arithmetic, and then checked in Rocq/CoqInterval for the parameterized theorem. Second, we initiate a formal Lagarias-route certificate. Lagarias criterion states that RH is equivalent to the global inequality. We formalize the finite quantity and produce a Coq-checked finite certificate. The final system identifies the exact unresolved mathematical bottlenecks: formalizing the Lagarias equivalence, proving the global tail theorem beyond any finite cutoff, and potentially reducing counterexamples to colossally abundant or related extremal integers. These results demonstrate that VGPT-RSI can produce certified RH-adjacent formal progress, organize proof dependencies, and avoid overclaiming when the remaining obstruction is genuinely mathematical.

13.
arXiv (CS.LG) 2026-06-19

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

arXiv:2606.19690v1 Announce Type: new Abstract: From the past few years, web intelligent enhancement systems increasingly rely on heterogeneous and dynamic web data to deliver personalized, context-aware services. However, traditional machine learning, deep learning, and reinforcement learning models often struggle with semantic understanding, adaptability, and scalability in continuously evolving web environments. In this research, a Multi-Granular Attention-based Reinforcement Web Intelligent Enhancement System (MGAR-WIES) is proposed to address the challenges by integrating semantic graph modeling, attention mechanisms, and adaptive reinforcement learning. Initially, heterogeneous web data comprising structured, semi-structured and unstructured sources are collected and preprocessed for generating unified feature representations. These representations are transformed into a dynamic semantic graph, where entities and their relationships are modeled by using graph embeddings enhanced by attention mechanisms for capturing both local relevance and global contextual dependencies. Subsequently, an adaptive multi-agent reinforcement learning strategy leverages the attention-aware semantic states to optimize personalized web actions like content recommendation, navigation optimization, and service adaptation. Finally, the continuous online feedback is further integrated to update graph representations and learning policies in real time by ensuring sustained adaptability and performance. The proposed MGAR-WIES acheived better results in terms of accuracy (80%) when compared with existing approaches.

14.
medRxiv (Medicine) 2026-06-17

High burden of subclinical TB in Africa revealed from a postmortem cohort.

Tuberculosis (TB) is increasingly recognised as a spectrum of infection and disease, yet the prevalence of viable, asymptomatic Mycobacterium tuberculosis (M.tb) infection remains uncertain. Subclinical Tuberculosis (scTB), defined as microbiologically confirmed M.tb infection in the absence of recognised symptoms, is under detected by symptom, sputum and imaging-based approaches. We conducted postmortem examinations of 94 adults who died from non-infectious causes, none of whom were clinically suspected of TB or reported TB related symptoms prior to death. Lung and extrapulmonary tissues were cultured for M.tb. Viable M.tb was confirmed in six individuals, corresponding to a prevalence of 6.4% (95% CI: 2.4 to 13.4%). These findings provide direct tissue-based evidence that viable, asymptomatic M.tb infection can persist beyond the reach of conventional clinical detection. Our data suggest that a biologically active reservoir of infection may exist undetected within high-burden settings, with implications for surveillance strategies aimed at TB elimination.

15.
arXiv (CS.CV) 2026-06-18

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. We introduce RNG-Bench (Reconstructive Non-Markov Games), a benchmark suite designed to isolate a base model's ability to reconstruct past observations and act on them during multi-step interaction. RNG-Bench includes two complementary games: Matching Pairs, where card identities briefly revealed at specific locations must later be recalled, and 3D Maze, where egocentric views must be integrated into a spatial map. Both games are evaluated under a unified harness with three controlled difficulty axes: grid size, visual pattern, and observation modality. The benchmark further introduces a head-to-head duel protocol to control for instance-level variance and a Memory Gap metric that disentangles forgetting from poor action selection. The hardest configurations require contexts of roughly 128K tokens and 350 image inputs per episode, and remain far from saturated by frontier MLLMs. Memory Gap analysis shows that most residual errors stem from forgetting earlier observations rather than from suboptimal decision making. Finally, fine-tuning Qwen3.5-9B on optimal-policy rollouts and filtered model demonstrations improves performance on RNG-Bench and transfers to existing benchmarks without degrading general multimodal capability.

16.
medRxiv (Medicine) 2026-06-24

Genetically Proxied Interleukin-6 Inhibition and Cancer Risk: A Multi-Ancestry Drug-Target Mendelian Randomization Study of Hepatocellular Carcinoma and Colorectal Cancer

Background: Interleukin-6 (IL-6) signalling drives chronic inflammation and is therapeutically targeted by tocilizumab, an approved IL-6 receptor inhibitor. Whether genetically proxied lifelong IL-6 inhibition causally influences the risk of hepatocellular carcinoma (HCC) or colorectal cancer (CRC) remains unanswered. Prior single-variant estimates from pooled observational data are methodologically limited and may reflect confounding. Methods: A two-sample drug-target Mendelian randomization (MR) study was conducted. Four independent cis-acting protein quantitative trait loci (pQTL) variants within the IL6 and IL6R gene loci (rs2228145, rs4129267, rs7529229, rs1800795) were selected as genetic instruments , with F-statistics ranging from 32.3 to 120.5, confirming instrument strength. Outcome data were obtained from four independent genome-wide association studies: HCC from BioBank Japan (BBJ; 1,866 cases, 195,745 controls), HCC from FinnGen Release 10 (674 cases, 218,118 controls), CRC from a European meta-analysis (19,948 cases, 12,124 controls), and CRC from BBJ (7,062 cases, 195,745 controls). Causal estimates were derived using inverse variance weighted (IVW) regression as the primary method, with MR-Egger and weighted median analyses as sensitivity methods. Cochran Q statistics assessed heterogeneity and MR-Egger intercept testing assessed directional pleiotropy. Results: Genetically proxied IL-6 inhibition showed no significant causal effect on HCC risk in East Asian populations (IVW odds ratio [OR] 0.997, 95% confidence interval [CI] 0.903 to 1.101, p=0.953) or European populations (IVW OR 0.984, 95% CI 0.802 to 1.208, p=0.880). Similarly, no causal effect was observed on CRC risk in European populations (IVW OR 1.015, 95% CI 0.957 to 1.075, p=0.623) or East Asian populations (IVW OR 0.999, 95% CI 0.948 to 1.052, p=0.971). Sensitivity analyses confirmed the absence of directional pleiotropy and heterogeneity across all four analyses. Leave-one-out analyses demonstrated that no single instrument drove the null findings. Conclusions: Genetically proxied IL-6 receptor inhibition, modelling the therapeutic effect of tocilizumab, showed no causal effect on HCC or CRC risk across four independent cohorts and two ancestries. These findings do not support a role for IL-6 pathway inhibition in the prevention of these cancers and provide reassuring genetic safety evidence regarding cancer risk in patients receiving tocilizumab. Larger HCC-specific GWAS are needed to definitively evaluate modest effects in this cancer type.

17.
arXiv (CS.CV) 2026-06-11

DynaTok: Token-Based 4D Reconstruction from Partial Point Clouds

We address 4D reconstruction from partial point cloud sequences, where depth-sensor observations are incomplete, unordered, and lack explicit temporal correspondences. This geometry-only setting is challenging due to missing observations and ambiguous dynamics. While recent progress has largely relied on image-based methods, existing point-based approaches typically focus on single objects, assume relatively complete inputs, or require explicit correspondences. To address these limitations, we propose DynaTok, a point-based framework for correspondence-free 4D reconstruction from partial point cloud sequences without images. DynaTok encodes frames into compact latent tokens, aggregates incomplete observations over time with a Transformer-based spatiotemporal encoder, and decouples geometry and motion through residual tokens in a unified model. A flow-matching decoder then reconstructs complete, temporally consistent 4D point-cloud sequences conditioned on the latent tokens. Experiments on object- and scene-level benchmarks demonstrate improved reconstruction quality and temporal coherence from partial point cloud observations. Project page: https://wrchen530.github.io/dynatok/.

18.
arXiv (math.PR) 2026-06-16

Eyring-Kramers asymptotics for infinite-dimensional stochastic gradient systems

arXiv:2606.16083v1 Announce Type: new Abstract: We study small-noise asymptotics for a class of reversible stochastic evolution equations in infinite dimensions. The dynamics are of the form \[ dX_t=-A\nabla F(X_t)\,dt+\sqrt{2\beta^{-1}A}\,dW_t, \] where $F$ is a regular multi-well potential, $A$ is a selfadjoint mobility operator, $W$ is a cylindrical Brownian motion and $\beta\gg 1$ is the inverse noise strength. The invariant measure is a Gibbs perturbation of a Gaussian reference measure, and the resulting framework covers, in particular, the stochastic Allen-Cahn and stochastic Cahn-Hilliard equations on bounded intervals. In the double-well case, we derive a sharp asymptotic formula for the first nonzero eigenvalue of the generator. This gives an infinite-dimensional Eyring-Kramers law for the spectral gap, with exponential rate determined by the communication height and leading prefactor determined by the local quadratic behavior at the relevant minima and saddle points. Our approach provides a general strategy for lifting finite-dimensional Eyring-Kramers analysis to infinite-dimensional stochastic gradient systems.

19.
arXiv (CS.CV) 2026-06-15

How do Self-Supervised Remote Sensing Vision Models Transfer to Downstream Tasks?

Self-supervised geospatial foundation models (GeoFMs) learn transferable representations from remote sensing data, but their downstream behavior is difficult to characterize. We study six representative GeoFMs spanning joint-embedding, reconstruction, and multimodal pretraining families, and evaluate transfer across classification, regression, and segmentation benchmarks under different label availability and downstream pipelines. We find that model rankings change across tasks and adaptation settings. Layerwise probing shows that, in most cases, task-relevant information is more accessible in intermediate transformer blocks compared to final-layer embeddings, and that GeoFMs exhibit distinct depthwise profiles. In segmentation case studies on PASTIS and Sen1Floods11, downstream adaptation settings such as decoder design and fine-tuning can be as impactful as the choice of GeoFM, and standard dense-prediction heads may be poorly aligned with how GeoFMs organize information over depth. Finally, CKA analysis on case studies shows that fine-tuning does not rewrite GeoFMs uniformly across depth, and the strongest changes are localized to the first linear layer of the MLP in ViT blocks. These results help explain why GeoFM rankings shift across benchmarks and motivate more representation-aware evaluation and adaptation strategies.

20.
arXiv (CS.AI) 2026-06-16

MedCollab: IBIS-Guided Multi-Agent Collaboration with Hierarchical Disease Relation Chains for Clinical Diagnosis

arXiv:2603.01131v3 Announce Type: replace-cross Abstract: Clinical diagnosis is a gradual process of evidence integration, in which physicians move from symptoms and medical history to examinations, competing hypotheses, disease relations, and treatment decisions. Large language models have advanced medical text understanding and generation. Yet their clinical use remains limited by weak evidence grounding, opaque reasoning, and inconsistent links among differential diagnosis, final diagnosis, diagnostic basis, and treatment planning. We introduce MedCollab, a multi-agent framework for full-cycle clinical diagnosis and report generation. MedCollab coordinates specialist and examination agents according to patient records. It structures agent deliberation with an Issue-Based Information System (IBIS) protocol, so that each diagnostic position is supported by patient-specific evidence and medical knowledge. It also builds Hierarchical Disease Relation Chains (HDRC) to connect accepted hypotheses through progression, complication, and comorbidity relations. During multi-round deliberation, a verifier-guided consensus module evaluates evidence support, medical plausibility, and logical conflicts. It then adjusts agent contributions and filters unsupported reasoning. Experiments on ClinicalBench and MIMIC-IV show that MedCollab outperforms leading LLMs and medical multi-agent baselines in diagnostic accuracy, evidence consistency, and clinical reasoning quality. These results indicate that structured and auditable collaboration can produce more faithful and clinically coherent diagnostic reports.

21.
arXiv (quant-ph) 2026-06-24

Resource theory of interactive quantum instruments

arXiv:2603.27676v2 Announce Type: replace Abstract: Quantum instruments describe both the classical outcome and the updated quantum state in a measurement process. To do this in a non-trivial way, instruments must have the capability to interact coherently with the state that they measure. Here, we develop a resource theory for instruments. We consider a relevant quantifier of the separation between interactive and non-interactive instruments and show that it admits three distinct operational interpretations in terms of quantum information tasks. These concern (i) the preservation of maximally entangled states after a local measurement, (ii) the average ability to preserve random states after measurement, and (iii) the ability to recover the classical information generated from measuring half of a maximally entangled state. We also introduce a natural set of allowed operations and show that the third task fully characterises the resource content of instruments. Our general framework reproduces as special cases established resource theories for channels and measurements.

22.
arXiv (quant-ph) 2026-06-19

Ultrafast nonadiabatic dynamics of tetraphenylsubstituted nitrogen-based heterocycles

arXiv:2604.16897v2 Announce Type: replace-cross Abstract: Tetraphenylpyrazine (TPP) and 2,3,4,5-tetraphenyl-1H-pyrrole (TePP) are closely related heterocycles bearing four phenyl substituents, whose structural similarity makes them a useful pair for comparing how intramolecular flexibility influences excited-state relaxation and emission in the gas phase and in the solid state. TPP is a prototypical solid-state luminescence enhancement (SLE) emitter, exhibiting a markedly increased quantum yield upon molecular aggregation. In contrast, TePP displays similar quantum yields in solution and solid state, characteristic of dual-state emission (DSE). This behaviour indicates that intramolecular rotations are already significantly hindered in the isolated-molecule regime, consistent with our previous observations for TPP and other solid-state emitters (Hernández-Rodríguez et al., ChemPhysChem, 2024, 25, e202400563). To unravel the excited-state dynamics underlying this contrasting behaviour, we performed mixed quantum-classical trajectory simulations on a single molecule of TPP and TePP employing the surface-hopping method. Twelve singlet states were included at the TD-B3LYP-D3/def2-SVP level, which were previously benchmarked against coupled cluster methods. Simulated observables such as gas phase ultrafast electron diffraction (GUED) and time-resolved fluorescence (TR-FL) signals allow us to dissect the distinct deactivation pathways operating in both systems in the gas phase, while also providing mechanistic insight into how these pathways are expected to evolve in solution and solid-state environments.

23.
medRxiv (Medicine) 2026-06-24

ADVISE: A Machine Learning Framework for Early Recognition of a Surrogate Marker for Ventilator-Associated Pneumonia Using Routinely Collected Critical Care Data

Background Ventilator-associated pneumonia (VAP) is the most frequent nosocomial infection in critical care, affecting 20-36% of mechanically ventilated patients. Early prediction is hampered by the absence of a reliable, objective diagnostic standard. We developed ADVISE (Automated Dudley Ventilation Infection Series Evaluation), a machine learning model to predict physiological deterioration consistent with developing VAP using routinely collected electronic health record data from a UK NHS intensive care unit. Methods Retrospective observational study of admissions at Russell's Hall Hospital ICU (2008-2026). Following National Data Opt-Out exclusion (158 admissions, 4.2%), 3,566 admissions generated 33,208 candidate 48-hour observation blocks. Six temporal variables - FiO2, ventilator mode, P:F ratio, procalcitonin (PCT), secretion amount, and secretion description - were extracted across the baseline window (hours 1-24). A composite VAP-surrogate outcome required concurrent P:F ratio decline (>=5%) and PCT rise (>=0.5 ng/mL) across the outcome window (hours 25-48). After sequential quality filters, 2,134 blocks (18 positive, 0.84% prevalence) were retained. An XGBoost classifier was trained using nested 5-fold cross-validation with scale_pos_weight=114.0 and ROC-based hyperparameter optimisation on 1,495 training blocks, evaluated on 639 held-out test blocks. Performance was assessed via AUROC, AUPRC, and calibration (Brier score). Bootstrap resampling (1,000 iterations) generated 95% confidence intervals. Results On the held-out test set (n=639, 5 positive outcomes), ADVISE achieved AUROC 0.874 [95% CI: 0.771-0.939] and AUPRC 0.031 [0.008-0.069], representing a 4.0-fold improvement over the no-skill baseline. Nested cross-validation mean AUROC was 0.844 +/- 0.078 (range 0.716-0.915). At the Youden-optimal threshold, sensitivity was 0% with specificity 97.8%, reflecting extreme class imbalance (0.78% test prevalence). A threshold targeting 80% sensitivity achieved sensitivity 80.0% [33.3-100.0%], specificity 87.4% [84.8-89.9%], positive predictive value 4.8% [1.1-9.9%], and negative predictive value 99.8% [99.4-100.0%], detecting 4 of 5 VAP cases with approximately 80 false alarms (12.6% false positive rate). Brier score was 0.0078. Feature importance identified baseline P:F ratio as the dominant predictor (41.3% total gain), followed by ventilator mode (26.1%), secretion amount (13.2%), secretion description (9.1%), procalcitonin (5.9%), and FiO2; (4.5%). Conclusions ADVISE demonstrates that baseline oxygenation trajectory and ventilatory support patterns - derived exclusively from routinely charted ICCA variables - can identify admissions at risk of VAP-related physiological deterioration with meaningful discrimination (AUROC 0.874) despite severe class imbalance. The 80% sensitivity operating point offers a clinically actionable alert rate (12.6% FPR), supporting integration into existing ICU workflows. This proof-of-concept study establishes feasibility; multi-site prospective validation is required before clinical deployment.

25.
arXiv (CS.LG) 2026-06-24

Stochastic Expectation Maximization for Robust State-Space Radio Interferometric Imaging

arXiv:2606.23944v1 Announce Type: cross Abstract: State–space models provide a flexible framework for analyzing dynamical systems, yet they often rely on Gaussian assumptions that fail to capture heavy-tailed or outlier-prone measurement noise. We propose a robust estimation scheme for linear state–space models subject to compound-Gaussian noise, as encountered for instance in radio interferometry affected by radio-frequency interference (RFI). The method relies on a Stochastic Approximation Expectation–Maximization (SAEM) algorithm in which the standard E-step is replaced by Monte Carlo sampling of the latent states and noise texture through closed-form Gibbs updates, enabling tractable inference despite the heavy-tailed likelihood. Numerical experiments show that the proposed method significantly improves reconstruction fidelity and robustness to RFI, outperforming a Gaussian EM algorithm and even an oracle RTS smoother. These results highlight the benefits of heavy-tailed state–space modeling and SAEM-based inference in interference-dominated imaging scenarios.