Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
PLOS Medicine 2026-06-04

Comparative impacts and cost-effectiveness of tuberculosis systematic screening strategies in prisons in Brazil, Colombia, and Peru: A mathematical modeling study

作者:

by Yiran E. Liu, José Victor Bortolotto Bampi, Ronan F. Arthur, Argita D. Salindri, Caroline Busatto, Pedro Avedillo Jiménez, Daniele Maria Pelissari, Fernanda Dockhorn Costa Johansen, Robert Arana-Narvaez, Alvaro Fernando Moreno Roca, Wilfredo Santos Solís Tupes, Esther Mori Jiu, Christian Alfredo Moreno Roca, Erika Albertina Abregú Contreras, Valentina Antonieta Alarcón Guizado, Julián Trujillo Trujillo, Belkys Marcelino, Mónica Alonso Gonzalez, Mayra Cecilia Córdova Ayllon, Ted Cohen, Moises A. Huaman, Jeremy D. Goldhaber-Fiebert, Julio Croda, Jason R. Andrews Background Incarceration is a leading driver of tuberculosis in Latin America. Systematic screening in prisons may reduce tuberculosis burden, but optimal strategies and cost-effectiveness remain uncertain. We examined the population-wide health impacts and cost-effectiveness of systematic screening in prisons in Brazil, Colombia, and Peru, comparing different timepoints, frequencies, and screening algorithms. Methods and findings Using dynamic transmission models calibrated to Brazil, Colombia, and Peru, we simulated annual or biannual (twice-yearly) prison-wide screening, alone or combined with entry and exit screening from 2026 to 2035. We evaluated four algorithms: (1) symptom screening, (2) chest X-ray with computer-aided detection (CXR-CAD), (3) symptoms and CXR-CAD (follow-up testing if either is positive), and (4) GeneXpert Ultra (Xpert) with pooled sputum. Individuals screening positive then received individual Xpert. We projected impacts on within-prison and population-level tuberculosis incidence in 2035, along with discounted costs (2023 US dollars) and disability-adjusted life years (DALYs). Model projections showed that combined entry, exit, and biannual screening with CXR-CAD was highly impactful and cost-effective across countries, reducing tuberculosis incidence by 61%–87% in prisons and 18%–28% population-wide. Compared to only biannual CXR-CAD (the next best strategy), the incremental cost per DALY averted of adding entry and exit screening was $2,984 (Brazil), $2,925 (Colombia), and $645 (Peru). Adding symptom screening to CXR-CAD marginally increased benefit and was only cost-effective in Peru’s higher-incidence prisons. Biannual screening alone remained cost-effective at prison incidence levels well below national averages, as well as at far lower willingness-to-pay thresholds. In settings without CXR-CAD, pooled Xpert was an impactful, cost-effective alternative. Key limitations include the model’s simplified representation of tuberculosis disease states and lack of stratification by age, gender/sex, HIV, or drug resistance. Conclusions These modeling results support immediate national-level adoption of prison-wide tuberculosis screening twice-yearly and at entry and exit, using CXR-CAD or pooled Xpert.

02.
arXiv (CS.CL) 2026-06-16

Few-Shot Biomedical Relation Extraction with Large Language Models: A Viable Alternative to Supervised Learning?

Biomedical relation extraction (BioRE) is a key step in transforming biomedical literature into structured knowledge. However, most existing approaches rely on supervised models trained on costly annotated datasets, limiting their scalability and adaptability across relation types and domains. We investigate few-shot BioRE using prompt-based learning with large language models (LLMs) and compare two task formulations: pairwise classification, which predicts relations for individual entity pairs, and joint generation, which extracts multiple relations in a single model call. Experiments on the BioREDirect dataset reveal a clear precision-recall trade-off. Pairwise classification achieves higher recall, whereas joint generation is more precise and computationally efficient. The best-performing model achieves a micro-F1 score of 0.44, substantially outperforming previous few-shot results (0.34) while remaining below the supervised baseline (0.56). Much of this gap is attributable to a single ambiguously defined relation type. When evaluated using macro-F1, which better captures performance across relation types in an imbalanced setting, prompt-based approaches outperform the supervised baseline (0.45 vs. 0.38), particularly on rare relation types. These findings highlight the potential of LLMs for BioRE in low-resource settings and underscore the importance of well-defined relation schemas.

03.
arXiv (CS.AI) 2026-06-18

From Specification to Execution: AI Assisted Scientific Workflow Management

arXiv:2606.18425v1 Announce Type: cross Abstract: Scientific workflow management systems (WMS) support scalable and reproducible execution of complex pipelines, but workflow design, implementation, and debugging remain largely manual and require significant expertise. Recent approaches using large language models (LLMs) show promise for workflow generation from natural language, but often rely on direct code synthesis, which limits transparency, reproducibility, and integration with workflow systems. We present an AI-assisted approach to scientific workflow management that combines specification-driven workflow generation, automated debugging, and distributed execution. The method introduces a structured specification phase that separates workflow intent, design, and implementation, allowing validation prior to code generation. We also develop an LLM-based debugging agent that diagnoses and resolves failures across multiple system layers. To support distributed execution and user interaction, we integrate Pegasus, a widely used WMS, with a Model Context Protocol (MCP) layer, providing a unified interface for workflow submission, monitoring, and control. We evaluate the approach using a federated learning workflow for medical imaging, chosen for its parallel, iterative, and dependency-intensive structure. The system generated and executed large-scale workflows with thousands of jobs, reduced debugging effort, and allowed non-expert users to construct workflows with expert-level design patterns. These results indicate that end-to-end AI-assisted workflow generation and execution is feasible, and point toward AI-driven platforms for managing the scientific workflow lifecycle.

04.
arXiv (quant-ph) 2026-06-25

Single-Period Floquet Control of Bosonic Codes with Quantum Lattice Gates

arXiv:2601.08782v2 Announce Type: replace Abstract: Bosonic codes constitute a promising route to fault-tolerant quantum computing. Existing Floquet protocols enable analytical construction of bosonic codes but typically rely on slow adiabatic ramps with thousands of driving periods. In this work, we circumvent this bottleneck by introducing an analytical and deterministic Floquet method that directly synthesizes arbitrary unitaries within a single period. The phase-space unitary ensembles generated by our approach reproduce the Haar-random statistics, enabling practical pseudorandom states in continuous-variable systems. We prepare various prototypical bosonic codes from vacuum and implement single-qubit logical gates with high fidelities using quantum lattice gates. By harnessing the full intrinsic nonlinearity of Josephson junctions, quantum lattice gates decompose quantum circuits into primitive operations for efficient continuous-variable quantum computing.

05.
Nature (Science) 2026-06-24

Genomic insights into the population dynamics and demise of Neanderthals

A surge of genetic data from the skeletal remains of Neanderthals disproves some assumptions and generates fresh questions about these ancient hominins. A surge of genetic data from the skeletal remains of Neanderthals disproves some assumptions and generates fresh questions about these ancient hominins.

06.
arXiv (CS.LG) 2026-06-12

Quantum Reservoir Computing for Short-Term Power Load Forecasting in Resource-Constrained Energy Systems

arXiv:2606.12806v1 Announce Type: cross Abstract: Short-term load forecasting is essential for reliable energy management, but practical deployment on edge devices requires models that remain accurate under limited memory, finite measurement budgets, and hardware noise. This work proposes a hardware-efficient Quantum Reservoir Computing (QRC) framework for energy load forecasting, where a fixed quantum reservoir transforms temporal input windows into high-dimensional features and only a classical Elastic Net readout is trained. To reduce deployment cost, the trained readout is compressed using post-training fixed-point quantization at bit widths from 8 to 2 bits. The framework is evaluated on the Tetouan and Spain energy load datasets under exact statevector simulation, 512-shot finite sampling, and realistic hardware-noise models from IBM FakeTorino and IBM FakeMarrakesh. Results show that 6-bit readout precision preserves full-precision forecasting performance while reducing readout memory by 81.2%. Below this point, degradation becomes dataset dependent, with Tetouan showing stronger sensitivity and Spain degrading more gradually. Hardware-noise validation further shows that the trained readout transfers to noisy reservoir states without retraining. These findings support quantized QRC as a resource-aware forecasting approach for near-term quantum time-series applications.

07.
medRxiv (Medicine) 2026-06-23

What Is the Optimal Timing and Frequency of Workload-Matched Postprandial Physical Activity Breaks? A Randomized Controlled Crossover Study of Cardiometabolic and Cognitive Responses During Sedentary Behavior

Purpose Postprandial sedentary behavior is associated with negative health effects and constitutes a large part of daily life in modern society. This study investigated how the timing of physical activity after eating influences glucose levels, cerebral and muscle oxygenation, cognitive performance, and well-being during subsequent sitting. Methods In a four-armed randomized crossover trial, healthy adults consumed four standardized meals separated by 48-hour washout periods. Each meal was followed by 2 hours of sitting combined, in random order, with one of four interventions: (1) sitting only, (2) 15 minutes of moderate intensity cycling immediately after eating, (3) 15 minutes of cycling 20 minutes after eating, or (4) three workload-matched five-minute cycling bouts during sitting. Interstitial glucose (continuous glucose monitoring), cerebral and muscle oxygenation (Functional near infrared spectroscopy), cognitive performance (Stroop test), heart rate, blood pressure, and subjective ratings were assessed every 30 minutes. Data were analyzed using repeated-measures ANOVA. Results Twenty participants (mean age 27.1{+/-}10.3 years, 12 females) completed the study. Cycling immediately after eating reduced mean glucose levels during postprandial sitting, while both 15-minute cycling bouts increased cerebral oxygenation. All active conditions enhanced muscle oxygenation. Heart rate and arousal increased with delayed cycling and active breaks. No effects were observed for blood pressure, cognitive performance, focus, or well-being. Conclusion A short bout of physical activity immediately after eating reduces postprandial hyperglycemia and improves brain oxygenation during sitting, whereas delayed activity and brief breaks increase physiological activation without cognitive or perceptual benefits.

08.
arXiv (CS.AI) 2026-06-16

Adaptive inference and function vectors in deep transformers

arXiv:2606.16694v1 Announce Type: cross Abstract: Transformers are widely used as a general-purpose substrate for learning complex correlations between a large collection of coupled variables, but their internal mechanisms have remained mysterious. We introduce a theory of a deep transformer as a mean-field interacting system that implements distributed inference, subject to constraints on communication, locality and depth. We show that such a system can exploit internal state representations ('function vectors') to infer a latent context variable at increasingly finer scales over its layers. In an in-context regression task, the theory predicts a non-trivial relationship between non-Gaussian, hierarchical structure in the latent context variable, and transformer depth. Predictions are tested using constrained linear attention transformers and demonstrate adaptive inference in deep architectures. Feedforward blocks and depth enable transformers to implement a much richer class of in-context learning algorithms than previously described.

09.
medRxiv (Medicine) 2026-06-16

MRMU: A New Paradigm for Mendelian Randomization by Accounting for Measured Covariates and Unmeasured Confounders

Mendelian randomization (MR) is a powerful approach for causal inference, however, its reliability is frequently compromised by unadjusted covariates and unmeasured confounders, such as unmeasured pleiotropy and sample structure. To address these challenges, we introduce MRMU, a novel paradigm for the MR framework. Unlike traditional single-variable or multivariable MR methods, MRMU selects instrumental variables only from the exposure of interest and estimates one exposure effect at a time, while jointly accounting for measured covariates and unmeasured confounders. This design improves the reliability of MR analyses. In simulations and real data, MRMU achieved better type I error control, higher statistical power, and more accurate effect estimation than existing MR methods. Applying to coronary artery disease (CAD), MRMU identified robust cardiometabolic risk factors, including LDL-C, APOB, systolic blood pressure, body mass index, and smoking initiation, with consistent evidence across multiple CAD datasets. In contrast, traits such as HDL-C, height, and educational attainment, which were found to be significant by existing MR methods, were no longer supported by MRMU. MRMU further supported blood pressure-related traits, rather than lipid traits, as the more relevant pathway linking urate to CAD. Finally, by integrating large-scale plasma proteomics data, MRMU identified candidate CAD drug targets beyond established HMGCR- and PCSK9-related pathways, highlighting its utility for therapeutic target prioritization.

10.
arXiv (CS.CV) 2026-06-25

Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models

Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by emerging AI evaluation guidelines. We introduce Facet-Probe, a five-facet audit (option, evidence-chunk, document-rank, image-set, and mixed-modality ordering) of 18 frontier and open-weight MLLMs. A Bayesian item-response model separates ordering noise from per-facet bias, and a same-ordering control estimates the decoder-stochastic floor for observed flips. We find that none of the 18 MLLMs we audit are order-invariant: screened per-facet panel-mean flip rates span 24-50%. A Gemini same-ordering control at temperature 0 estimates a substantial ordering excess over a same-input decoder-noise floor in verified cells. Capability predicts but does not eliminate flips; the best model still flips on 13.4% of trials. In our Gemini mitigation tests, training-free prompt changes are modality-conditional and do not transfer from text to visual reasoning. These results suggest that prompt-level mitigation alone is unlikely to provide general order robustness, motivating future work on training-time and architectural approaches. We propose cross-ordering flip rate as a standard reporting axis for MLLMs.

11.
arXiv (CS.CL) 2026-06-25

Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation

Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20$\times$ longer than standard instruction fine-tuning (IFT) outputs, meaning every practitioner who chooses reasoning distillation implicitly forgoes training a larger IFT model on the same compute budget. Whether this trade-off is worthwhile remains unaddressed. We study it with a controlled experiment: a single teacher generates paired IFT and reasoning outputs for identical prompts by toggling only its reasoning mode, isolating supervision format as the sole variable. Training students at five scales (0.5B to 14B) and evaluating on 18 benchmarks, we find that at matched FLOPs, IFT lies on or near the Pareto frontier across the majority of configurations. Reasoning reaches the Pareto frontier only on open-ended tasks at 7B and above. Even there, a sequential curriculum mixing just 25-50\% reasoning data with IFT captures most of the accuracy benefit at far lower compute cost.

12.
arXiv (CS.LG) 2026-06-17

Tight $L_\infty$ Sample Complexity for Low-Degree and Sparse Boolean Polynomials

arXiv:2606.17319v1 Announce Type: cross Abstract: Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

13.
arXiv (CS.CV) 2026-06-16

Human Cognition in Machines: A Unified Perspective of World Models

This report of world models distinguishes prior works by the cognitive functions they innovate. Many works claim an almost human-like cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles from human and machine cognition theory. In moving towards human-like world models we present a conceptual unified framework for world models that fully incorporates all the cognitive functions (i.e., memory, perception, language, reasoning, imagining, motivation, and metacognition) and identify gaps in existing research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and metacognition remain drastically under-researched, and we propose concrete directions to address these gaps informed by active inference and global workspace theory. We also introduce epistemic world models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied to video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.

14.
medRxiv (Medicine) 2026-06-18

Avidity of anti-pertussis toxin antibodies is associated with symptomatic Bordetella pertussis infection in a novel controlled human infection model

Background The association between functional antibody responses following Bordetella pertussis infection and symptomatic disease remains unclear. We characterized the maturation of anti-pertussis toxin (PT) IgG avidity after human challenge with B. pertussis and determined its association with symptomatic infection. Methods Healthy adults were intranasally inoculated with live B. pertussis organisms in a controlled human infection model and monitored for development of pertussis symptoms (NCT05136599). Serum samples were collected one day before inoculation and at 14, 28, 56, 180, and 365 days post challenge. Anti PT IgG avidity was tested using a titration of ammonium isothiocyanate (the bond breaking agent) to quantify a wide range of antibody avidities from low to very-high. Associations between covariates and avidity were examined using linear regression models, and high dimensional analyses were used to integrate all data. Findings Anti PT IgG avidity increased in both symptomatic (n=20) and asymptomatic (n=10) participants after the challenge, reached maximum levels at day 56, and then declined through day 365. Symptomatic participants developed significantly higher levels of high- and very high-avidity anti-PT antibodies at 28, 56, 180, and 365 days post-challenge compared with those who remained asymptomatic. In multivariate analyses, symptomatic infection was associated with higher levels of high and very high avidity anti-PT IgG at day180 and365 after challenge. Distinct avidity profiles in symptomatic vs asymptomatic participants emerged at day28 onwards, with the former group having higher levels of antibodies with higher avidities. However, levels of medium-high, high and very high avidity antibodies in symptomatic participants were lower at day 365 after challenge compared to their peak levels. Interpretation Anti-PT IgG avidity was associated with symptomatic B. pertussis infection and thus may serve as a surrogate of clinical disease outcome. These results highlight that antibody avidity provides an additional functional assay besides antibody quantitation to dissect immune responses to pertussis. Further investigation of anti PT IgG avidity should be pursued in natural pertussis outbreaks to determine whether it might be used to differentiate symptomatic from asymptomatic infections for epidemiologic purposes.

15.
arXiv (CS.CL) 2026-06-17

Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code Translation

While large language models (LLMs) have greatly advanced the functional correctness of automated code translation systems, the runtime efficiency of translated programs has received comparatively little attention. With the waning of Moore's law, runtime efficiency has become increasingly important for program quality, alongside functional correctness. Our preliminary study reveals that LLM-translated programs often run slower than human-written ones, and this issue cannot be remedied through prompt engineering alone. Therefore, our work proposes SwiftTrans, a code translation framework comprising two key stages: (1) Multi-Perspective Exploration, where MpTranslator leverages parallel in-context learning (ICL) to generate diverse translation candidates; and (2) Difference-Aware Selection, where DiffSelector identifies the optimal candidate by explicitly comparing differences between translations. We further introduce Hierarchical Guidance for MpTranslator and Ordinal Guidance for DiffSelector, enabling LLMs to better adapt to these two core components. To support the evaluation of runtime efficiency in translated programs, we extend existing benchmarks, CodeNet and F2SBench, and introduce a new benchmark, SwiftBench. Experimental results across all three benchmarks show that SwiftTrans achieves consistent improvements in both correctness and runtime efficiency.

16.
arXiv (CS.LG) 2026-06-16

MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry

arXiv:2606.05693v2 Announce Type: replace Abstract: Large language models (LLMs) have shown promise for molecular property prediction, but their ability to reason over chemical structures remains limited, as molecular representations such as SMILES differ substantially from the natural language on which LLMs are primarily trained. To bridge this semantic and chemical knowledge gap, we propose MolE-RAG, a training-free, molecule-centric retrieval-augmented generation framework for LLM-based molecular property prediction. MolE-RAG augments each prediction with three complementary sources of inference-time context: retrieved chemistry literature, molecule-specific information including compound synonyms, identifiers, functional group annotations, and physicochemical descriptors, and structurally similar molecules retrieved from the training set. We evaluate MolE-RAG across nine molecular property prediction tasks using proprietary, chemistry-specialized, and open-source LLMs. Across general-purpose LLMs, MolE-RAG improves ROC-AUC by up to 28 percentage points on classification tasks and reduces regression RMSE by up to 67% relative to a SMILES-only baseline. We further find that the utility of each context source varies across models and tasks, with different models benefiting most from textual retrieval, molecular context, or structural retrieval. These results suggest that molecule-centric retrieval can improve LLM-based molecular property prediction without model fine-tuning while providing a flexible framework for integrating heterogeneous chemical knowledge at inference time.

17.
arXiv (CS.LG) 2026-06-11

Reinforcement Learning with Action-Triggered Observations

arXiv:2510.02149v2 Announce Type: replace Abstract: We introduce Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), a reinforcement learning framework for partial observability in which full state observations occur stochastically at each step, with probability determined by the chosen action. We derive Bellman equations tailored to this setting and establish the existence of an optimal policy. Exploiting the fact that sporadic observations reveal the full state, we provide an equivalent formulation in which agents commit to action-sequences between consecutive observations. Under the linear MDP assumption, we show that the value function over such action-sequences admits a linear representation in a finite-dimensional feature map, enabling standard regression-based methods. As an application, we derive ATST-LSVI-UCB, an optimistic algorithm achieving regret $\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$ for episodic learning with geometrically distributed horizons, where $K$ is the number of episodes, $d$ the feature dimension, and $\gamma$ the discount factor (episode continuation probability), matching the known rate for linear MDPs with full observability.

18.
arXiv (CS.LG) 2026-06-12

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

arXiv:2601.09693v3 Announce Type: replace Abstract: Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for predefined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

19.
arXiv (CS.CL) 2026-06-12

SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents

Skill self-evolution methods for LLM agents aim to turn execution trajectories into reusable skill documents, but current pipelines typically learn from one trajectory per task, merge candidate skill patches before checking them, and load the full skill corpus before inference. We propose SkillCAT, a training-free framework that separates this process into three stages. Contrastive Causal Extraction (CCE) samples multiple trajectories for each task and compares same-task success/failure pairs to identify evidence that explains outcome differences. Assessment-Augmented Evolution (AAE) replays each candidate patch on source-task clones and keeps only patches that improve or preserve task outcomes before hierarchical skill patch merging. Topology-Aware Task Execution (TTE) compiles the evolved skills into a routable sub-skill topology, so inference loads only the capability nodes relevant to the task. We evaluate SkillCAT on common agent benchmarks, including SpreadsheetBench, WikiTableQuestions, and DocVQA, and further test cross-model and out-of-distribution generalization. Across these settings, SkillCAT raises the average score over baselines by up to 40.40%, demonstrating reliable skill evolution without model training.

20.
PLOS Medicine 2026-06-02

Proteomic signatures of early retinal neurodegeneration in type 2 diabetes mellitus

作者:

by Huangdong Li, Ziyu Zhu, Shaopeng Yang, Weijing Cheng, Shaoying Tan, Zhuoyao Xin, Lei Zhang, Zhuoting Zhu, Shida Chen, Wenyong Huang, Wei Wang Background Retinal neurodegeneration is an early and independent feature of diabetic retinal disease and has been proposed as a window into the systemic neural consequences of diabetes, yet accessible molecular biomarkers and individualized prediction tools remain scarce. We aimed to identify circulating plasma protein signatures of diabetic retinal neurodegeneration (DRN) and to translate them into a clinically usable risk prediction system. Methods and findings In this multi-cohort prospective observational study, we integrated high-throughput plasma proteomics with longitudinal optical coherence tomography (OCT) in two independent populations. The discovery cohort comprised 1,492 participants had baseline plasma proteomics and OCT, and 1,218 were followed with repeated OCT over 6 years in Guangzhou Diabetic Eye Study (GDES). DRN was quantified by the annualized OCT-derived retinal nerve fiber layer thinning rate. In multivariable analyses adjusted for age, sex, smoking, systolic blood pressure, HbA1c, and diabetes duration, we identified 71 plasma proteins associated with development and progression of DRN. These proteins mapped onto pathways governing inflammatory immune recruitment, extracellular matrix remodeling, and microvascular homeostasis, providing a plausible biological basis for DRN. We developed a proteomics-based DRN model (Pro-DRN) using eight machine learning (ML) algorithms, including XGBoost and LightGBM. In the independent test set, Pro-DRN achieved a C-index of 0.860, rising to 0.908 when integrated with clinical variables. Compared with six conventional models, Pro-DRN improved discrimination (ΔC-index 0.137 to 0.159; all P 

21.
arXiv (CS.AI) 2026-06-16

Hybrid NARX-LLM for Greenland Iceberg Discharge: Prompt-Driven Residual Correction

arXiv:2606.15288v1 Announce Type: cross Abstract: Greenland iceberg discharge exhibits complex nonlinear dynamics with limited observability, challenging traditional predictive models. We present a Hybrid NARX-LLM framework that combines a nonlinear autoregressive model with exogenous inputs (NARX) and a large language model (LLM) for residual correction. We further propose a Physics-Informed Prompt (PIP) method that transforms unstructured physical knowledge into structured prompts for zero-shot in-context reasoning. The primary objective is to explore the corrective potential of this framework for modeling Greenland iceberg discharge, rather than merely optimizing predictive accuracy. The NARX component captures intrinsic temporal dependencies, while the LLM, guided by PIP, encodes glacier dynamics and environmental drivers and perceives key trend patterns to correct systematic prediction errors. This integration allows the model to reason about unmodeled factors and produce interpretable residuals, enhancing overall predictive accuracy. Applied to Greenland iceberg discharge time series, our approach addresses extreme events that are difficult to predict due to rare variations and nonstationary trends, a limitation often overlooked by traditional methods. By fusing structured time-series modeling with knowledge-driven foundation AI, the framework offers a scalable and interpretable pathway to bridge data-limited climate forecasting with physics-informed LLM reasoning. The code is available.

22.
arXiv (CS.AI) 2026-06-25

Learning Non-Vacuous Generalization Bounds from Optimization

arXiv:2206.04359v3 Announce Type: replace-cross Abstract: One of the fundamental challenges in the deep learning community is to theoretically understand how well a deep neural network generalizes to unseen data. However, current approaches often yield generalization bounds that are either too loose to be informative of the true generalization error or only valid to the compressed nets. In this study, we present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by stochastic gradient algorithms is essentially fractal-like and thus can derive a tighter bound over the algorithm-dependent Rademacher complexity. The main argument rests on modeling the discrete-time recursion process via a continuous-time stochastic differential equation driven by fractional Brownian motion. Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks such as ResNet and Vision Transformer, even when they are trained on a large-scale dataset (e.g. ImageNet-1K).

23.
arXiv (CS.CL) 2026-06-18

Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Retelling

Counterfactual story retelling exposes LLM shortcomings in constrained narrative solution spaces where they can no longer rely on recalling memorised training data. Ground-truth-based post-training, such as SFT, fails to teach LLMs how to generate logical and rational narrative events. In this paper, we introduce Retell, Reward, Repeat (RRR), an RL-based pipeline synthesising Structuralist Narratology with scalar narrativity to teach storytelling structure. We extend the TimeTravel dataset with human-annotated stages of narrative equilibrium to evaluate reward models. By using d-RLAIF, RRR derives training signals from the narrativity of textual features without the need for reference outputs. Evaluations demonstrate that RRR-trained LLMs outperform few-shot and SFT baselines in logic, rationality, and completeness, with output quality additionally validated by blind human preference. Relying on a small, query-only dataset, RRR provides a linguistically grounded, cost-effective post-training mechanism for storytelling–a domain currently lacking effective post-training methods. RRR highlights the continued relevance of integrating established linguistic theories into contemporary NLP.

24.
Nature (Science) 2026-06-10

Two-component exciton condensates in an electron–hole bilayer

作者:

Macroscopic quantum coherence emerges when bosons condense into a Bose–Einstein condensate (BEC)1–5. Excitons are a long-sought solid-state route to high-temperature BECs with strong interactions, electrical tunability and potentially multicomponent spinor order, but conclusive evidence for equilibrium condensation has remained elusive. Here we report evidence for two-component exciton BECs in MoSe2/hBN/WSe2 electron–hole bilayers6–9 by probing the spin–valley susceptibility of constituent electrons and holes. This heterostructure hosts equilibrium exciton fluids with four spin–valley flavours. Magneto-optical spectroscopy in a dilution refrigerator reveals three exciton condensate phases with distinct flavour polarizations. At zero magnetic field, the many-body ground state is a coherent superposition of two condensed intravalley exciton flavours. Under a magnetic field, the intravalley exciton condensate first switches to a two-component intervalley condensate through a first-order quantum phase transition at a weak critical field and then turns into a fully polarized single-component condensate at high fields. The condensate signatures form a dome in density–temperature space, persisting up to approximately 1.8 K. Our results establish van der Waals electron–hole bilayers as a versatile platform for strongly interacting, multicomponent exciton BECs. Macroscopic quantum coherence arises in two-component exciton Bose–Einstein condensates within MoSe2/hBN/WSe2 electron–hole bilayers, exhibiting distinct spin–valley polarized phases, quantum phase transitions under magnetic fields and stable condensate behaviour up to approximately 1.8 K.

25.
bioRxiv (Bioinfo) 2026-06-23

Automated Segmentation of Prostatic Gold Fiducial Markers for MR-Only Radiotherapy Planning Using Multi-Modal Consensus Deep Learning

Purpose: To develop and evaluate a multi-model consensus deep learning approach for automated gold fiducial marker (FM) segmentation in T1-weighted prostate MRI. Materials and Methods: In this retrospective study, T1-weighted MRI and CT-derived reference standard segmentations were collected from 127 prostate cancer patients (all male; mean age, 70 years +/- 7 [standard deviation]; age range, 50-88 years; collected between October 2020 and January 2026) who each had three implanted gold FMs. A 3D U-Net was trained on 93 subjects using four random seeds to produce an ensemble. At inference, marker-class probability maps were averaged across models and the top three connected components selected. Performance was evaluated on 34 temporally held-out subjects (9 tuning, 25 test) using marker-level sensitivity and precision with exact (Clopper-Pearson) 95% confidence intervals (CIs). A model count ablation study was performed. The pipeline was deployed for on-scanner processing on Siemens MRI systems via the OpenRecon framework and as a browser-based application using WebAssembly, executing entirely client-side. Results: The four-model consensus achieved 96% (70 of 73) sensitivity and 95% (70 of 74) precision on 25 test subjects, with 29 of 34 (85%) subjects achieving perfect marker detection. Single models had a mean sensitivity of 84% (SD, 9%), improving to 96% with four-model consensus (SD,