Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-11

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

arXiv:2511.14427v4 Announce Type: replace-cross Abstract: Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbations, including sensor noise, and changes in object dynamics. Evaluations in multiple challenging, contact-rich robot manipulation tasks in simulation and the real world showcase the effectiveness of MSDP. Our approach exhibits strong robustness to perturbations and achieves high success rates on the real robot with as few as 6,000 online interactions, offering a simple yet powerful solution for complex multisensory robotic control. Website: https://msdp-pearl.github.io/

02.
medRxiv (Medicine) 2026-06-12

Deconvolution-based cell-type specific DNA methylation-wide and transcriptome-wide association studies identify risk CpG sites and genes associated with colorectal cancer risk

Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.

03.
arXiv (CS.LG) 2026-06-12

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

arXiv:2606.13626v1 Announce Type: cross Abstract: We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantized VAEs, and generative adversarial networks. We compare their ability to model polyphonic note sequences, learn useful latent representations, and generate stylistically coherent compositions. Our experiments show that the autoregressive LSTM with attention produces the most musically coherent samples, while vector quantization helps mitigate posterior collapse and yields more structured outputs than conventional recurrent VAEs. The adversarial approach captures local pitch patterns but remains difficult to train and generalizes less reliably to Bach's style. These results highlight the relative strengths and failure modes of autoregressive, latent-variable, and adversarial approaches for symbolic music generation.

04.
arXiv (CS.CL) 2026-06-19

Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text

Clinical NLP increasingly relies on electronic health record (EHR) data to detect suicidal behaviors, treating clinical documentation as more reliable ground truth than social media. We argue that this framing obscures how EHR-based suicidality datasets encode a particular operationalization of suicidality, shaped by who authors the data, how episodes are bounded, and how ambiguity is resolved. We ground this argument in a case study of the ScAN dataset, built over MIMIC-III clinical notes. We show how governance constraints, ICD-based cohort selection, single-annotator labeling, and hospital-stay-level aggregation produce labels that reflect clinician-documented judgments, treat suicidality as a bounded episode, and assume that intent can be reliably inferred from documentation. A linguistic analysis demonstrates that identical labels subsume heterogeneous clinical framings differing in temporality, negation, and uncertainty. We argue that clinical NLP should examine the assumptions embedded in suicidality datasets before interpreting their labels as ground truth.

05.
medRxiv (Medicine) 2026-06-24

Matrix matters: head-to-head concordance of serum and plasma for NULISAseq CNS Disease Panel

Blood-based proteomic profiling is now widely applied in neurodegenerative and neuroinflammatory disease, yet the choice between serum and plasma remains poorly characterised for high-multiplex platforms. Many legacy biobanks hold mainly serum, whereas most current NUcleic-acid-Linked Immuno-Sandwich Assay (NULISA) studies use plasma. We compared the 130-protein NULISAseq central nervous system (CNS) Disease Panel head-to-head in matched serum and plasma collected at the same draw from 62 participants (30 neurodegenerative, 19 demyelinating, 13 healthy controls). Agreement was measured with Spearman correlation (rho), Lin's concordance correlation coefficient (CCC), the intraclass correlation coefficient (ICC) and the mean paired serum-to-plasma difference (dNPQ). Concordance was moderate to high: 123 of 130 proteins reached significance and 18 reached rho >= 0.90, with a median rho of 0.72 (range 0.10-0.988). Proteins fell into three tiers. Cytoskeletal markers (NEFH rho=0.988; NEFL rho=0.947) and glial GFAP (rho=0.949, |dNPQ|

06.
arXiv (CS.AI) 2026-06-16

Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and Stability

arXiv:2606.16883v1 Announce Type: cross Abstract: Generalization is a critical property of data-driven models, particularly deep learning models deployed in safety-critical applications. Robustness-based generalization bounds have gained attention as a principled way to link robustness properties to generalization performance, often in a data-dependent manner. However, most existing bounds suffer from vacuousness in practical settings, yielding loose upper bounds that greatly exceed the actual error rates and limiting their usefulness for real-world evaluation. While this issue is often attributed to the uncertainty term, a substantial part of the problem originates from the robustness term itself, particularly for the 0-1 loss. Existing approaches typically treat the robustness term as a global measure, ignoring its variation across different sub-regions of the input space. In this work, we propose a generalization bound that addresses this limitation by scaling the robustness term according to the number of stable and unstable samples within each sub-region. Our bounds incorporate both data- and model-dependent factors while maintaining practical relevance (yielding tighter upper bounds on true error). Experiments on models trained on the ImageNet dataset show that our bounds remain consistently non-vacuous and achieve the tightest estimates among existing methods, closely aligning with empirical performance across a range of robust deep neural networks.

07.
arXiv (CS.CL) 2026-06-24

Do LLM Attribution Metrics Transfer? Auditing Retrieval-Augmented Generation Evaluation Across Datasets and Constructs

Practice often treats automatic metrics for attribution in LLM retrieval-augmented generation as interchangeable. We audit eight automatic scorers – lexical, embedding, and BERTScore baselines alongside entailment/grounding-trained models (clean and FEVER NLI, the checker MiniCheck) – across three evaluation constructs (provenance/topicality, generated-answer attribution, and fact-check entailment), asking whether any scorer transfers: stays within the 95% confidence interval of the best audited scorer on every dataset of a multi-dataset construct. In the construct with the most multi-dataset human-labeled coverage – generated-answer attribution (AttributionBench's four source datasets, n = 1,610, with independent HAGRID, n = 2,150) – none does: the per-dataset metric rankings invert (Kendall tau = -0.64, p = 0.031 on AttributedQA vs. LFQA), and an off-the-shelf NLI scorer that is best on short-claim AttributedQA (AUROC 0.90) collapses to AUROC 0.53 (chance) on long-form LFQA, where BERTScore wins (0.91); the flip is not a length or truncation artifact. This instability has a concrete decision cost: a naive "best-on-average" rule for choosing an evaluator fails leave-one-dataset-out (mean held-out regret 0.172 AUROC, worse than fixing one scorer), so metric choice must be validated on the target dataset rather than learned from others. A prompt-based LLM judge avoids the chance-level collapses the automatic scorers suffer (no LFQA collapse) but is not uniformly best, ~100x costlier, and non-deterministic – relocating, not removing, the validation burden.

08.
Nature (Science) 2026-06-16

Mathematicians are developing rules for AI use — other fields should follow

Authors: Unknown Author

The mathematics community is right to call for transparency, integrity and fairness to be protected when AI tools are used. Researchers in other disciplines could learn from this approach. The mathematics community is right to call for transparency, integrity and fairness to be protected when AI tools are used. Researchers in other disciplines could learn from this approach.

09.
arXiv (CS.CV) 2026-06-17

SkillMoV: Mixture-of-View Routing with Prototype-Conditioned Gating for Unified Multi-View Proficiency Estimation

Estimating human proficiency from video is a key challenge for automated skill assessment, with applications in sports coaching, music pedagogy, surgical training, and workplace learning. Existing approaches often focus on individual scenarios or rely on shared multi-view aggregation, limiting their ability to adapt to heterogeneous camera viewpoints and activity domains. We introduce SkillMoV, a unified, parameter-efficient framework for multi-scenario proficiency estimation from synchronized multi-view video. At its core, SkillMoV introduces a Mixture-of-View Projector (MoVP), which adapts the mixture-of-experts paradigm to camera-specific view features. MoVP is composed of four stages: (i) a Mixture-of-View soft router with twelve expert MLPs that learns view-dependent expert preferences without camera-identity supervision; (ii) cross-view attention to align synchronized cameras; (iii) learnable prototype anchoring to condition the representation on class-level reference vectors; and (iv) a prototype-conditioned gated projection that produces the final skill embedding. We evaluate SkillMoV on EgoExo4D across six skill domains and three separately trained view configurations: Ego, Exos, and Ego+Exos. SkillMoV reaches 50.17% overall accuracy in the Exos setting with a single model trained jointly across all scenarios, surpassing the strongest reported Exos result among the compared methods by 3.57 percentage points. In Ego+Exos, SkillMoV remains close to the best reported result in that setting (47.63% versus 48.20%). Ablations on the selected Exos configuration validate each component: MoV routing contributes +6.61 pp over attentive aggregation, cross-view attention +4.92 pp, prototype anchoring +4.07 pp, and stochastic view dropout +3.90 pp. Through LoRA adaptation, SkillMoV trains only 23.32% of its parameters and adds limited measured overhead relative to a LoRA-only baseline.

10.
arXiv (CS.CV) 2026-06-25

NeuroShield: A Device-Agnostic Foundation Model for EEG Authentication

A central challenge in EEG authentication is that models are typically tied to the acquisition settings in which they are trained. In particular, variations in headset hardware, channel layout, and signal duration create heterogeneous recordings that existing models are not designed to handle, causing each new headset or dataset to be treated as a separate model-development problem. This fragmentation limits multi-dataset learning, hinders knowledge transfer, and reduces model reusability. To address this limitation, we present NeuroShield, a reusable foundation model for EEG authentication that learns identity-discriminative embeddings from variable-channel and variable-length EEG recordings through a dual-stage transformer architecture. We pretrain NeuroShield on three public EEG datasets comprising 15{,}762 subjects and 28{,}116 sessions, and evaluate transfer on two unseen downstream datasets. Our evaluations show that, after fine-tuning, NeuroShield reduces equal error rate by 0.44–8.06 percentage points relative to the state of the art. NeuroShield further generalizes to segments longer than those seen during training and operates across channel layouts not encountered during pretraining. These results establish NeuroShield as a reusable and adaptable EEG identity encoder across heterogeneous recording settings. We release NeuroShield as open source to support reproducibility and community adoption.

11.
arXiv (CS.CL) 2026-06-25

Speech Codec Probing from Semantic and Phonetic Perspectives

Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. Speech tokenizers are expected to preserve both semantic and acoustic information for downstream understanding and generation tasks. However, emerging evidence suggests that the term "semantic" in speech processing does not align with linguistic lexical-semantic, leading to a mismatch between speech and text modality. In this paper, we systematically analyze the information encoded by several widely used speech tokenizers, evaluating their lexical-semantic and phonetic content through three tasks. Our results show that current tokenizers primarily capture phonetic rather than lexical-semantic structure, deriving practical implications for the design of next-generation speech tokenization methods. Code is released to public at https://github.com/Alexuan/codec_probing_release.

12.
arXiv (CS.CV) 2026-06-16

VisualClaw: A Real-Time, Personalized Agent for the Physical World

Vision language models are serving as general-purpose interfaces for complex multimodal tasks. However, deployment still faces three gaps: VLMs typically incur high latency and cost when processing dense video frames and long prompts, the agent scaffold remains static after deployment, and standard video-QA benchmarks do not test whether agents can use visual evidence inside tool-using workspaces. We present VisualClaw, a self-evolving multimodal agent built around two principles. First, hybrid encoding reduces deployment cost by filtering less informative streaming frames with a cascaded gate and compressing the text skill bank through hot/cold top-k injection. Second, skill evolution lets the agent learn from failures: retrieved memories condition an evolver as direct concatenated context or as guided evidence, producing skill-bank updates that help future questions. Across 4 video-QA benchmarks with 2 VLMs, VisualClaw cuts per-question API cost by an average -98% versus full-frame upload and by -25.9% over the offline uniform 8 frame baseline, while boosting accuracy in most settings, e.g., an average +3.85% and a peak +15.80% on EgoSchema with Gemini 3 Flash. To address the gap, we curate VisualClawArena, a 200-scenario multimodal agentic benchmark built through a strict five-stage pipeline; models must use video evidence, documents, dynamic updates, and executable checks inside a workspace. On VisualClawArena, the same framework with computer-use agent backends improves macro accuracy by +2.9% for Codex (GPT-5.5) and +3.2% for Claude Code (Sonnet 4.6) over no-evolution baselines, with a -9.5% cost reduction compared to the uniform-sampled baseline. These properties make VisualClaw a natural fit for edge applications, where the cascade reduces a 1-hour streaming session from ~3,600 API uploads down to only 5-20 calls and the self-evolution makes it a perfect personalized assistant.

13.
arXiv (CS.LG) 2026-06-17

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better

arXiv:2601.21455v2 Announce Type: replace-cross Abstract: Conformal prediction(CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick(PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provide extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques. Code is available at https://github.com/benben-cd/PT-Conformal-Prediction.

14.
arXiv (CS.AI) 2026-06-24

MultiMem: Measuring and Mitigating Memorization in Multi-Modal Contrastive Learning

arXiv:2606.22220v2 Announce Type: replace-cross Abstract: Memorization in machine learning models enables high performance on rare in-distribution samples by capturing their atypical patterns. However, it also causes harmful retention of noise and outliers, degrading generalization. While memorization has been extensively studied in both supervised and self-supervised learning in the vision domain, it remains unexplored in multi-modal contrastive learning. We address this gap by introducing MultiMem, the first metric designed to quantify memorization in multi-modal contrastive learning. Through our systematic analysis, we demonstrate that cross-modal semantic misalignment has the strongest influence on memorization, with text being the dominant modality driving memorization, followed by video, image, and audio. We show that targeted augmentations applied across all modalities effectively reduce memorization as measured by our MultiMem metric and improve model performance. Overall, this work establishes the first framework for measuring and mitigating memorization in multi-modal contrastive learning, preventing harmful data retention and contributing to higher-performing models.

15.
bioRxiv (Bioinfo) 2026-06-12

From Proteome Mining to Structural Validation: Phosphopyruvate Hydratase as a Structurally Tractable Drug Target in Kinetoplastid Parasites

Chagas disease, caused by Trypanosoma cruzi, demands novel therapeutic strategies that overcome the toxicity and limited efficacy of current treatments. To address this need, herein we report an integrative, target-centric strategy that combines parasite proteome mining, structural modeling, and experimental validation. Functional enrichment and druggability analyses identified phosphopyruvate hydratase (PPH) as a promising candidate due to its essential metabolic role and limited similarity to human homologs. Notably, proteome mining revealed the presence and conservation of PPH across kinetoplastid parasites, including Leishmania donovani, supporting its evaluation beyond T. cruzi. For the selected PPH sequences, AlphaFold-derived three-dimensional models underwent extensive molecular dynamics refinement, yielding stable conformational ensembles suitable for structure-based studies. Using this validated model, virtual screening of the Latin American Natural Products Database - LANaPDB - identified aptosimon as a top-ranked compound candidate. Molecular dynamics simulations further showed ligand-dependent binding behavior, suggesting alternative binding modes distinct from the canonical substrate configuration. In vitro assays demonstrated consistent antiparasitic activity against intracellular T. cruzi amastigotes (IC50 = 3.52 ug/mL) and Leishmania donovani promastigotes (IC50 = 13.06 ug/mL), supporting the biological relevance of the aptosimon-related lignan chemotype, hinokinin, across two kinetoplastid parasite models. Together, these results support PPH as a structurally tractable and biologically relevant candidate target, while identifying an aptosimon-related lignan chemotype, represented experimentally by hinokinin, as a cross-species antiparasitic scaffold that warrants further biochemical target-validation studies.

16.
arXiv (math.PR) 2026-06-11

Arrangements of Consecutive Numbers in Mallows Permutations

arXiv:2606.12410v1 Announce Type: cross Abstract: We study the random variable that counts the number of specific arrangements of clustered consecutive numbers in permutations under the Mallows distribution. We provide an asymptotic expression for the expected value of this random variable. This result extends and tightens the previously known result by Pinsky (2022) concerning clustered consecutive numbers in Mallows permutations. Moreover, we identify a range of parameters for which the distribution of the number of arrangements of clustered consecutive numbers in Mallows permutations is close to a Poisson distribution.

17.
medRxiv (Medicine) 2026-06-18

Artificial Intelligence-informed mobile behavioural interventions to support adolescents mental health in schools: protocol for a randomised controlled trial using the MindCraft app

Background: Children and young people (CYP) are particularly affected by mental health problems. Mobile apps provide a scalable and accessible approach to adolescent mental health support, and schools are well-positioned to address multiple risk factors and deliver large-scale interventions. By combining active (self-reported) and passive (sensor-derived) data, mobile apps can model mental states and deliver context-aware support. Artificial Intelligence (AI) enables adaptive, context-aware recommendations tailored to each user. However, there is limited research on AI-based mental health interventions in community CYP. MindCraft is a mobile app designed to monitor adolescents mental health using active and passive data and provide AI-informed recommendations ("nudges"). This study aims to investigate the effectiveness of personalised AI nudges delivered through MindCraft on improving mental health outcomes among adolescents in schools in the United Kingdom. Methods: The study is a three-arm RCT using a prospective cohort of secondary school students aged 14-19. Following informed consent, participants complete a baseline online assessment at school and download MindCraft. The primary outcome is the Strengths and Difficulties Questionnaire global and subscale scores. Secondary outcomes include the Eating Disorders Diagnostic Scale, the Sleep Condition Indicator Questionnaire, the Self-Injurious Thoughts and Behaviours Interview, the Self-Efficacy Questionnaire for Children and the World Health Organisation-Five Well-Being Index. Participants are randomised to: (1) an AI-informed intervention group receiving personalised nudges, (2) an active control receiving non-personalised nudges, or (3) a control group with self-monitoring only. Participants use the app for four weeks, with follow-up at one month. Repeated-measures analyses will assess changes across time points. Discussion: We hypothesise that AI nudges will have a greater positive effect on mental health outcomes at one month than general nudges and self-monitoring. Our findings will provide key evidence on the effectiveness of personalised mobile AI recommendations for adolescents mental health and inform school-based mental health prevention and early intervention. This study will contribute evidence on the ethical, acceptable, and scalable integration of AI-enabled digital mental health tools within public health and educational systems, with implications for the design of future digital public health interventions and policies supporting their safe integration in schools.

18.
arXiv (CS.LG) 2026-06-16

Diffusion Models for Adaptive Sequential Data Generation

arXiv:2606.06007v2 Announce Type: replace Abstract: Generating realistic synthetic sequential data is critical in real-world applications across operations research, finance, healthcare, energy systems, and scientific computing, where time-indexed observations are used for prediction, simulation, risk assessment, and data-driven decision-making. While diffusion models have achieved remarkable success in generating static data, their direct extensions to sequential settings often fail to capture temporal dependence and information structure. Designing diffusion models that can simulate sequential data in an adapted manner, and hence without anticipation of future information, therefore remains an open challenge. In this work, we propose a sequential forward-backward diffusion framework for adapted time series generation. Our approach progressively injects and removes noise along the sequence, conditioning on the previously generated history to ensure adaptiveness. A novel score-matching objective is introduced for efficient parallel training. We derive rigorous statistical guarantees under a generic framework, then establish score approximation, score estimation, and distribution estimation results with ReLU networks serving as a concrete instance. Empirically, we validate our method on synthetic data, including ARMA models and Gaussian processes, and demonstrate its effectiveness in constructing mean-variance optimal portfolios.

19.
arXiv (math.PR) 2026-06-12

Non-commutative Law of iterated logarithm

arXiv:2509.22037v2 Announce Type: replace-cross Abstract: We prove optimal non-commutative analogues of the classical Law of Iterated Logarithm (LIL) for both martingales and sequences of independent (non-commutative) random variables. The classical martingale version was established by Stout [Sto70b] and the independent case by Hartman-Wintner [HW41]. Our approach relies on a key exponential inequality essentially due to Randrianantoanina [Ran24] that improves that from Junge and Zeng [JZ15]. It allows to derive an optimal non-commutative Stout-type LIL just as in [Zen15], from that martingale result we then deduce a non-commutative Hartman-Wintner type LIL for independent sequences of random variables.

20.
arXiv (CS.CV) 2026-06-25

Re-mixing Embeddings for Patient Augmentation in Data Scarce Multiple Instance Learning

Data scarcity is a major bottleneck in medical Multiple Instance Learning (MIL), especially for rare diseases or expensive modalities. We introduce a statistically grounded patient augmentation approach that generates realistic patients directly in embedding space. Using Gaussian Mixture Models as a probabilistic clustering approach on pooled instance embeddings from all patients, our method learns disease-specific "recipes"-statistical distributions of instances across unsupervised clusters. New patients are then generated by sampling embeddings from clusters based on learned recipes. Unlike existing methods that require examples from all categories, our method can generate patients offline by re-mixing pooled embeddings. Generated patients are further selected based on uncertainty quantification to improve MIL performance. We evaluate our method across three clinically relevant scarcity scenarios: (i) cross-dataset transfer, where an entirely missing "healthy" class is generated using statistics from an external cohort; (ii) low-data regimes, where class sizes are extremely limited; and (iii) small-cohort non-image tasks, including single-cell RNA-seq and flow cytometry. Across all experiments, our method improves performance over baseline, often outperforming other bag-mixing strategies. Notably, in the missing-class scenario, a performance comparable to full-dataset training is achieved, demonstrating its potential for rare disease diagnostic and privacy-preserving patient augmentation. The code is available at https://github.com/marrlab/RECIPE

21.
arXiv (CS.LG) 2026-06-19

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

arXiv:2505.18201v2 Announce Type: replace-cross Abstract: Controlling flapping-wing drones requires controllers that handle time-varying, nonlinear, underactuated dynamics from incomplete, noisy sensor data. Recent advances in artificial intelligence (AI), particularly reinforcement learning (RL), have opened new perspectives for addressing such complex control problems through data-driven policy optimization from interaction with the environment. Yet purely data-driven methods are sample-inefficient, demanding extensive, sometimes unsafe exploration, especially without guiding physical models. This motivates hybrid AI-physics frameworks. This article proposes a hybrid model-free/model-based flight-control approach using the reinforcement twinning algorithm. The model-based (MB) component uses an adjoint formulation and an adaptive digital twin continuously identified from live trajectories; the model-free (MF) component uses RL. The two agents share knowledge via transfer learning, imitation learning, and shared experience between the real environment and the digital twin, coordinated by a policy referee that selects which agent acts in reality based on digital-twin performance and a real-to-virtual consistency ratio. The framework is evaluated for the longitudinal control of a flapping-wing drone, modelled as a nonlinear time-varying system driven by quasi-steady aerodynamic forces. The hybrid strategy is tested under three adaptive-model initializations: (1) offline identification from existing data, (2) random initialization with fully online identification, and (3) offline pre-training with biased parameters followed by online adaptation. In all cases, the hybrid framework improves performance, robustness, and sample efficiency over purely model-free and purely model-based approaches.

22.
medRxiv (Medicine) 2026-06-17

Characterisation of disease progression in hantavirus haemorrhagic fever with renal syndrome

Hantaviruses can cause haemorrhagic fever with renal syndrome (HFRS). This is a clinically variable disease in which severe outcomes are hypothesized to arise from dysregulated host responses. To characterise this, longitudinal, label-free plasma proteomics was used to compare disease progression in a unique well-defined cohort of patients infected with either Dobrava virus (DOBV) or Puumala virus (PUUV) hantaviruses. Patients were stratified by clinical severity. The average viral load in the first available sample from hospitalized patients was higher in those who went on to have severe infection, and higher in patients infected with DOBV. There was marked separation of infected patients from controls across early, mid and late disease, including after viral RNA clearance, suggesting a sustained systemic host-response signature. Proteomic signatures were consistent with a strong acute-phase response in both mild and severe disease. There was evidence of activation of the adaptive humoral response at later stages. Hierarchical clustering identified severity-associated pathways linked to endothelial dysfunction, thrombocytopenia, vascular leakage and renal injury. These findings define a durable plasma proteomic signature of hantavirus disease and support a model in which severe HFRS is driven by persistent inflammatory, complement and platelet/coagulation pathway activation rather than viral burden alone.

23.
arXiv (CS.CL) 2026-06-17

Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping

Large language model applications build prompts from templates, and Handlebars is a widely used templating engine and the default prompt-template format in Microsoft Semantic Kernel. Its double-brace {{x}} expression HTML-escapes the interpolated value and is documented as the safe default; its triple-brace {{{x}}} expression inserts the value raw. We show that this choice silently governs an application's exposure to structural role injection, where attacker-controlled data carries chat role delimiters that forge a higher-privilege turn. A model-free analysis establishes the mechanism: Handlebars escaping rewrites angle brackets but not square brackets, colons, or Markdown hashes, so it neutralises ChatML, Llama-3, and XML role delimiters (survival rate 0.00) while leaving Llama-2 [INST], legacy Human:/Assistant:, and Markdown ### delimiters intact (survival rate 1.00 for the last two). We then run 5760 trials across seven delimiter families, two attack objectives, and four models (GPT-3.5 Turbo, GPT-4o mini, GPT-4.1 mini, Claude Haiku 4.5) at a combined API cost of 1.63 USD. GPT-3.5 Turbo follows the task-hijack instruction in 97% of raw and 91% of escaped trials, with the escaping protection concentrated in the angle-bracket families and absent for the colon- and Markdown-based families; the harder secret-exfiltration objective, which does not saturate, exposes the same family interaction more cleanly. Claude Haiku 4.5 resists both objectives almost entirely. The escaped default protects only the delimiter schemes whose characters HTML escaping happens to cover, gives no protection for the rest, and cannot substitute for a structural separation of instruction and data.

24.
medRxiv (Medicine) 2026-06-23

Agentic Autodiscovery of Diastolic Dysfunction Phenotypes from Surface Electrocardiogram

Background: Left ventricular diastolic dysfunction (LVDD) is a major determinant of heart failure (HF), yet its assessment relies on multiparametric echocardiography, limiting scalability. We previously demonstrated that generative artificial intelligence (AI) can synthesize tissue Doppler imaging (TDI) waveforms from the 12-lead ECG. The growing complexity of candidate architecture creates a need for automated model-discovery frameworks. Objectives: To evaluate agentic AI-based auto-discovery for ECG-based LVDD assessment using either raw ECG or synthetic TDI waveforms. Methods: Two attention-based agentic AI architectures were developed using an automated large language model-driven refinement framework that optimized transfer-learning and multimodal architectures through autonomous proposal, validation, and selection of candidate model configurations. Development was performed in 1,011 paired ECG-echocardiography studies and externally validated in 983 patients using two reference frameworks: (i) data-driven phenogroups and (ii) the 2025 ASE Diastolic Function Guidelines. External validation was performed in CODE-15% (n=219,567) for HF-related mortality and EchoNext (n=35,718) for structural heart disease associations. Results: Despite the modest cohort size, the ECG-based agentic search achieved area under the receiver operating characteristic curve (AUCs) of 0.87 (95% CI: 0.85-0.89) and 0.83 (95% CI: 0.80-0.86) for phenogroup and guideline-based LVDD severity classification. Corresponding AUCs for the synthetic TDI-based model were 0.82 (95% CI: 0.80-0.85) and 0.80 (95% CI: 0.77-0.84), respectively. In large-scale external validation, both models stratified incident HF mortality with subdistribution hazard ratios ranging 5.5 to 9.5 (Gray's p

25.
arXiv (CS.CL) 2026-06-18

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tasks require simultaneous reasoning from the stances of all involved stakeholders whose decisions are mutually dependent and thus cannot be solved in isolation. We characterize this challenge as stance entanglement, a form of decision complexity distinct from execution complexity. To address it, we propose Multi-Agent Fictitious Play (MAFP), a novel MAS paradigm that represents stakeholder stances as agents and formulates decision-making as an equilibrium-seeking process. Built on the game-theoretic principle of fictitious play, MAFP iteratively updates each agent's decision by best responding to the empirical mixture of other agents' past decisions. This enables agents to expose and address one another's weaknesses, progressively improving decision quality and robustness. We evaluate MAFP on challenging decision-making tasks that test the capability of deciding strategies for competitive scenarios prior to acting. MAFP outperforms both single-round and multi-round baselines on two complementary metrics, tournament strength and robustness, demonstrating its effectiveness in addressing stance entanglement.