Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-19

CUPID: Reconstructing UV Texture Maps for Interpretable Person-of-Interest Deepfake Detection

Deepfakes targeting a high-profile individual, known as Person-of-Interest (POI), are a threat to modern democracies and societies. Current POI deepfake detection methods still struggle to combine robustness to post-processing, efficiency and interpretability, focal aspects of modern deepfake detectors. In this paper we propose CUPID, a POI video deepfake detector that combines UV texture maps, a facial appearance representation derived from 3D face reconstructions, with the representation learning capabilities of the Masked Autoencoder (MAE). Our method does not require any deepfake videos in its training phase. Moreover, it does not even require to include a specific POI in the training set: the combination of UV texture maps extracted from real video frames and the MAE context-guided reconstruction yields a latent space that captures rich and discriminative facial features also for identities unseen during training. In the testing phase, the embeddings extracted from a query video depicting the POI can be matched against pristine reference videos to assess the video authenticity. Furthermore, operating in the UV space naturally provides an additional layer of interpretability. Specifically, we can extract decoded residual maps that highlight which facial regions of a test video deviate most from the identity representation of the corresponding POI. Experiments on four deepfake datasets show that CUPID outperforms current state of the art on most datasets and achieves the best overall robustness against strong downscaling and compression, providing also substantially faster inference. Our experimental code will be released at https://github.com/polimi-ispl/CUPID.

02.
arXiv (CS.CL) 2026-06-11

Context-Aware Multimodal Claim Verification in Spoken Dialogues

Every day, millions absorb claims from podcasts and streams that no fact-checker ever sees. Spoken misinformation is built through conversation, where credibility comes not from facts alone but from how claims are framed, reinforced, or left unchallenged across turns. Yet fact-checking has focused on isolated text, leaving dialogue audio under-studied. We introduce MAD2, a new Multi-turn Audio Dialogues benchmark for spoken claim verification, containing 1,000 two-speaker dialogues with 3,368 check-worthy claims and approximately 10 hours of audio, and propose calibrated multimodal fusion of a context-aware audio encoder and a dialogue-aware text model. Across settings, adding dialogue context improves verification, but the gains depend on scenario type. Using only preceding context often matches offline performance, supporting live-moderation settings, and audio contributes most when transcript-based models are destabilized by additional context. Overall, conversational structure matters more for verification than misinformation framing.

03.
arXiv (CS.CL) 2026-06-11

Unifying Learning Dynamics and Generalization in Transformers Scaling Law

作者:

The scaling law, a cornerstone of Large Language Model (LLM) development, predicts improvements in model performance with increasing computational resources. Yet, while empirically validated, its theoretical underpinnings remain poorly understood. This work formalizes the learning dynamics of transformer-based language models as an ordinary differential equation (ODE) system, then approximates this process to kernel behaviors. Departing from prior toy-model analyses, we rigorously analyze stochastic gradient descent (SGD) training for multi-layer transformers on sequence-to-sequence data with arbitrary data distribution, closely mirroring real-world conditions. Our analysis characterizes the convergence of generalization error to the irreducible risk as computational resources scale with data, especially during the optimization process. We establish matching upper and lower bounds on the excess risk, characterized by a distinct phase transition. In the initial optimization phase, the excess risk decays exponentially relative to the computational cost ${\sf C}$. However, once a specific resource allocation threshold is crossed, the system enters a statistical phase, where the generalization error follows a power-law decay of $\Theta(\mathsf{C}^{-1/7})$. These rates are certified by complementary lower bounds – statistical, via an information-theoretic two-point reduction, and optimization-side, via a first-order oracle argument – rendering the two-stage law tight up to constants, logarithmic factors, and a condition-number gap. Beyond this unified framework, our theory derives isolated scaling laws for model size, training time, and dataset size, elucidating how each variable independently governs the bounds of generalization.

04.
arXiv (CS.AI) 2026-06-12

Generativism: Toward a Learning Theory for the Age of Generative Artificial Intelligence

arXiv:2606.12441v1 Announce Type: cross Abstract: The four dominant learning theories of behaviorism, cognitivism, constructivism, and connectivism show significant conceptual limitations as generative artificial intelligence (AI) proliferates in educational settings. These frameworks were formulated before the emergence of AI systems capable of generating, synthesizing, and reasoning about knowledge. This article critically examines each learning theory and identifies assumptions challenged by generative AI's affordances. Drawing on research in distributed cognition, extended mind, human-AI collaboration, AI literacy, cognitive offloading, and metacognition, the article proposes Generativism as a learning theory for the generative AI age. Generativism posits that learning increasingly occurs through the iterative co-construction of knowledge between human learners and AI systems. The proposed framework is organized around four principles: epistemic partnership, distributed agency, generative literacy, and adaptive metacognition. The framework offers a foundation for rethinking instructional design, learning, assessment, and expertise development in contexts where generative AI plays an integral role in cognition.

05.
arXiv (CS.AI) 2026-06-19

RIVET: Robust Idempotent Voice Attribute Editing

arXiv:2606.19629v1 Announce Type: cross Abstract: Voice attribute editing models modify characteristics such as age and gender while preserving speaker identity. In large-scale speech datasets, however, attribute annotations are often noisy or inconsistent, which can cause conditional generative models to produce unstable edits. In this work, we show that idempotency provides an effective mechanism for improving robustness to noisy labels. An idempotent operator is one for which repeated application does not change the result, i.e., f(f(x)) = f(x). Enforcing this property acts as an implicit regularizer that reduces sensitivity to mislabeled examples. We introduce RIVET, a training framework that incorporates an idempotency objective to improve robustness to label noise. We evaluate RIVET under controlled label noise and on the GLOBE dataset with naturally noisy annotations. RIVET improves editing success and better preserves speaker identity than standard training, showing that idempotency improves robustness in voice editing models.

06.
medRxiv (Medicine) 2026-06-22

How knowledge shapes community stigma and social support for women seeking abortion in the Democratic Republic of Congo: A cross-sectional study.

Background The Democratic Republic of Congo (DRC) bears one of the highest maternal mortality ratios globally (746 per 100,000 live births), with nearly 11% of deaths attributable to complications of unsafe abortion. Despite ratification of the Maputo Protocol and related national policies, access to safe abortion remains limited, largely due to entrenched stigma. Social support, encompassing emotional, informational, and instrumental assistance, is critical in shaping womens abortion-seeking behaviors and health outcomes. This study examines the influence of community-level knowledge on stigma and social support for women seeking abortion care. Methods A cross-sectional survey was conducted from May 2024 to June 2024 among 1,715 adults in Kinshasa and North Kivu provinces. Analyses focused on a sub-sample of 574 respondents reporting familiarity with women who had undergone abortion. Structural Equation Modeling (SEM) was applied to estimate direct and indirect pathways linking community knowledge, stigma, and social support. Results Two core knowledge indicators, recognition of abortion as a safe medical procedure and awareness of legal conditions for access, were significantly associated with outcomes. A one-unit increase in knowledge corresponded to a 0.39-point increase in social support and a 0.19-point reduction in stigma. Enhanced knowledge promoted empathetic attitudes, reinforced practical support, and mitigated moralizing judgments toward women seeking abortion. Conclusions Strengthening community knowledge emerges as a strategic lever to reduce abortion-related stigma and enhance social support in the DRC. These findings underscore the importance of integrating stigma-reduction and knowledge-enhancement interventions into reproductive health programs to improve womens access to safe and dignified abortion care.

07.
arXiv (CS.AI) 2026-06-17

Visored: A Controlled-Natural-Language Prover for LLM-Generated Mathematics

arXiv:2606.17581v1 Announce Type: cross Abstract: We present a dependent-type-based prover designed around the way LLMs (and humans) tend to write mathematics, complementing existing systems such as Lean and Rocq. Its core design choices are a surface that imitates mathematical natural language and a rule-driven automation layer that closes the routine steps a textbook would omit, so that an accepted proof can be re-emitted as a checked Lean file. Early experiments suggest that, even without any prover-specific training data, LLMs can learn to use it effectively on the miniF2F benchmark. Lean output excerpts: https://github.com/xiyuzhai-husky-lang/visored/

08.
bioRxiv (Bioinfo) 2026-06-19

Perturbation Curve models continuous transcriptional response trajectories and improves prediction of genetic modulations

Single-cell CRISPR screens, Perturb-seq, have revolutionized functional genomics by revealing biological causality. However, although perturbation assignments are typically represented as discrete labels, the cell-level effective strength of perturbations is often continuous and diverse. Current analytical frameworks struggle to decouple the variability in perturbation strength from the diversity of downstream responses. Here, we present Perturbation Curve (PertCurve), a nonlinear, curve-based computational framework that models the trajectories of transcriptomic responses by explicitly incorporating diverse perturbation magnitudes and strengths. By ordering cells by perturbation strength, we demonstrate that PertCurve accurately recapitulates the response magnitudes and reveals the distinct modularity and asynchrony patterns of downstream gene behaviors. These patterns are categorized into archetypes, including proportional, sensitive, and threshold responses. By applying this framework across CRISPRi/a modalities, we identify universal response patterns in viral infection, apoptosis, and proliferation genes, and reveal previously overlooked context-specific regulatory features in cell differentiation. Finally, incorporating PertCurve into perturbation prediction models and evaluation metrics enhances predictive performance, delivering actionable insights for refining established models.

10.
bioRxiv (Bioinfo) 2026-06-12

Generalisable tissue-wide molecular reconstruction from histology

Spatial transcriptomics technologies measure gene expression within intact tissues but remain difficult to scale across large tissue sections and patient cohorts. Consequently, many studies rely on tissue microarrays (TMAs) or sparse spatial profiling designs, where molecular measurements are available for only limited tissue regions and are often generated using heterogeneous gene panels. Existing H&E to spatial gene expression prediction methods remain challenged by sparse molecular measurements, partially overlapping gene panels and tissue-wide reconstruction across heterogeneous spatial datasets. Here, we present GHIST+, a framework for tissue-wide reconstruction of single-cell molecular states from H&E histology. GHIST+ integrates cellular morphology, local tissue context and shared tissue representations to extend sparse molecular measurements into tissue-wide molecular maps across heterogeneous spatial datasets. Across multiple cancer types and GTEx breast tissues, GHIST+ reconstructs biologically meaningful tissue-wide molecular organisation from sparse TMA-derived measurements while preserving spatial tissue structure, cell-type organisation and age-associated tissue states across cancer and non-cancer settings. GHIST+ establishes a scalable framework for transforming sparse spatial profiling experiments into tissue-wide molecular maps, enabling cohort-scale molecular reconstruction from routine histology under heterogeneous spatial transcriptomic settings.

11.
arXiv (math.PR) 2026-06-18

Functions of Bounded Variation and Point Processes

arXiv:2606.08304v2 Announce Type: replace-cross Abstract: We investigate the relationship between the analytical properties of functions of bounded variation and the statistical behavior of hyperuniform point processes. We establish several characterization formulas for the jump part of the gradient of a bounded variation function, extending and unifying previous results by Beretti–Gennaioli and Dávila. In particular, we provide new expressions for the $L^2$-jump of the gradient using both difference quotients and Fourier transform methods. Furthermore, we connect these analytic structures to the theory of hyperuniform point processes. By analyzing the variance of linear statistics associated with bounded variation functions, we provide asymptotic estimates that depend on the specific classification of the hyperuniformity of the point process. The results show how the regularity and jump discontinuities of a function dictate the growth rate of fluctuations in point processes. Finally, we introduce an averaged quadratic BMO-type oscillation functional over translated and rotated cube partitions, similar to the one recently studied by Ambrosio et al., and prove, using results from point process, that it converges to an explicit dimensional constant times the $L^2-$jump, giving in particular a further new characterization of the perimeter of a set.

12.
arXiv (CS.CV) 2026-06-16

MambaH-Fit: Rethinking Hyper-surface Fitting-based Point Cloud Normal Estimation via State Space Modelling

We present MambaH-Fit, a state space modelling framework tailored for hyper-surface fitting-based point cloud normal estimation. Existing normal estimation methods often fall short in modelling fine-grained geometric structures, thereby limiting the accuracy of the predicted normals. Recently, state space models (SSMs), particularly Mamba, have demonstrated strong modelling capability by capturing long-range dependencies with linear complexity and inspired adaptations to point cloud processing. However, existing Mamba-based approaches primarily focus on understanding global shape structures, leaving the modelling of local, fine-grained geometric details largely under-explored. To address the issues above, we first introduce an Attention-driven Hierarchical Feature Fusion (AHFF) scheme to adaptively fuse multi-scale point cloud patch features, significantly enhancing geometric context learning in local point cloud neighbourhoods. Building upon this, we further propose Patch-wise State Space Model (PSSM) that models point cloud patches as implicit hyper-surfaces via state dynamics, enabling effective fine-grained geometric understanding for normal prediction. Extensive experiments on benchmark datasets show that our method outperforms existing ones in terms of accuracy, robustness, and flexibility. Ablation studies further validate the contribution of the proposed components.

13.
PLOS Medicine 2026-05-08

Optimal minimal residual disease threshold in pediatric acute myeloid leukemia: A retrospective cohort study based on the TARGET database

by Xiong-yu Liao, Hong Zheng, Jian-pei Fang, Dun-hua Zhou, Kun-yin Qiu Background Minimal residual disease (MRD) monitoring is a cornerstone of risk stratification in pediatric acute myeloid leukemia (AML), with a threshold of 0.1% conventionally defining positivity by flow cytometry. Advances in flow cytometric technologies, enabling detection of leukemic cells with higher sensitivity and specificity, warrant a reevaluation of whether a lower threshold improves prognostic accuracy. Methods and findings We conducted a retrospective cohort study using data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET)-AML initiative. The study population comprised 1,205 pediatric patients with de novo AML treated across Children’s Oncology Group (COG) clinical trial centers. Patients were enrolled between September 1996 and December 2016, with a median follow-up of 6.2 years (range: 0.5–20.1 years). The primary objective was to compare the prognostic performance of the traditional MRD threshold (≥0.1%) with a lower threshold (≥0.05%) after induction courses 1 and 2. The main outcome measure was 5-year event-free survival (EFS). Analyses included Kaplan−Meier survival estimates, Cox proportional hazards models to calculate hazard ratios (HR) with 95% confidence intervals (CI), receiver operating characteristic (ROC) curves, and net reclassification improvement (NRI). The optimal threshold for predicting 5-year EFS, determined by ROC analysis, was 0.05% after both induction course 1 (AUC: 0.840, 95%CI[0.76,0.88]) and course 2 (AUC: 0.854, 95%CI[0.78,0.89]). The 0.05% threshold demonstrated higher HR for the first event than the 0.1% threshold (after course 1: HR = 2.8, 95%CI[2.3,3.3]; P 

14.
arXiv (CS.LG) 2026-06-16

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

arXiv:2510.01175v2 Announce Type: replace Abstract: While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized matrix sensing problem. We prove that WN with Riemannian optimization achieves linear convergence, yielding an exponential speedup over standard methods that do not use WN. Our analysis further demonstrates that both iteration and sample complexity improve polynomially as the level of overparameterization increases. To the best of our knowledge, this work provides the first characterization of how WN leverages overparameterization for faster convergence in matrix sensing.

15.
arXiv (CS.CV) 2026-06-17

A Benchmark for Omni-Modal Reasoning in Long Videos

Long-form omni-modal video understanding requires integrating vision, speech, and ambient audio with coherent long-context reasoning. Existing video benchmarks often trade off temporal scale, modality coverage, open-ended interaction, and interpretable scoring. To address this gap, we introduce LongShOTBench, a long video understanding benchmark designed around three coupled goals: holistic omni-modal integration, intent-driven open-ended interaction, and rubric-level diagnosis. It builds single- and multi-turn questions from real viewing scenarios, with systematic tasks probing visual, speech, ambient-audio, temporal, and cross-modal reasoning. Each item includes a reference answer and a weighted criterion-level rubric, letting evaluation identify which perceptual facts, temporal links, modality-grounding requirements, and reasoning steps are satisfied or missed. All samples are manually verified to improve grounding, clarity, and rubric reliability. We also introduce LongShOTAgent, a training-free omni-modal evidence-seeking agent coupling full-video preprocessing with targeted retrieval, query-adaptive segment refinement, and explicit claim verification over visual, speech, and non-speech audio evidence. Its iterative search-refine-verify loop exposes intermediate evidence and lets modality-specific specialists re-analyze relevant moments before answering. We evaluate 105 video-capable models spanning open-source omni-modal models, vision-language systems, audio LLMs, agentic pipelines and closed-source APIs. Current MLLMs remain far from saturating LongShOTBench, while our LongShOTAgent is the strongest training-free system, reaching 66.64% overall. By releasing the benchmark, leaderboard, and method, we provide a shared, interpretable testbed for advancing long-form omni-modal video reasoning. Code, data, and the leaderboard are available at https://longshot.cvmbzuai.com/.

16.
arXiv (CS.CL) 2026-06-18

LLMZero: Discovering Adaptive Training Strategies for RL Post-Training via LLM Agents

RL post-training strategies are dataset-dependent and reveal a recurring empirical pattern: capacity parameters accumulate monotonically across stages, while regularization parameters predominantly oscillate in response to shifting training dynamics. This distinction matters because fixed schedules commit all parameters to fixed trajectories and therefore cannot express the non-stationary exploration-exploitation tradeoffs that regularization must track; the principle provides actionable design rules for multi-stage training. We discover this through LLMZero, a system where LLM agents search over training trajectories via tree search, diagnosing pathologies at each checkpoint and proposing coordinated multi-parameter transitions. Across 4 diverse GRPO tasks, LLMZero discovers strategies that improve over the base model by 9% to 140% relative and over grid search by 6% to 15% relative, consistently outperforming random search and the skill-based agent. The structural principle transfers across tasks, providing an explanation for why discovered strategies take qualitatively different forms yet share similar parameter dynamics.

17.
arXiv (CS.CV) 2026-06-15

$\mu_0$: A Scalable 3D Interaction-Trace World Model

World models that capture how actions induce physical change enable scalable robot learning without reliance on embodiment-specific action labels. Pixel-space video models provide broad visual priors but expend model capacity on dense appearance reconstruction, while direct action models require embodiment-specific labels that hinder scalability. We present $\mu_0$, a scalable world model based on 3D traces. Rather than predicting dense pixels or directly modeling actions, $\mu_0$ forecasts smooth 3D trajectories for salient interaction points such as objects, tools, hands, and contact regions, yielding a compact, embodiment-agnostic motion interface. To enable training from diverse video sources, our TraceExtract system automatically extracts 3D supervision by selecting keypoints, constructing globally aligned traces, and associating motion segments with hierarchical language captions. This TraceExtract supervision pretrains $\mu_0$ by combining a pretrained vision-language backbone with a modular trace expert, which represents each query via B-spline control points and predicts future traces. Experiments show that $\mu_0$ outperforms baselines in both 2D and 3D trace prediction, including trace prediction models and tokenized VLM methods. Because $\mu_0$ is frozen and reusable, it can be paired with action experts for downstream robot embodiments. Despite action-free pretraining, the resulting trace-conditioned policies achieve performance competitive with VLA models pretrained with action supervision, such as $\pi_0$. These results establish 3D traces as a scalable and transferable representation for cross-embodiment manipulation.

18.
arXiv (quant-ph) 2026-06-17

Einstein-Podolsky-Rosen correlations between mechanical oscillators revealed through SU(1,1) interferometry

arXiv:2606.18202v1 Announce Type: new Abstract: Quantum correlations are essential for achieving quantum advantage in computing, communication and sensing. Moreover, their observation challenges and constrains our fundamental understanding of nature. Mechanical oscillators in the quantum regime provide an appealing platform for preparing and investigating quantum correlations at macroscopic scales. Despite substantial progress, however, continuous-variable quantum correlations stronger than entanglement have not yet been observed in this macroscopic regime. Here, we report the experimental observation of continuous-variable Einstein-Podolsky-Rosen correlations between two spatially-separated mechanical oscillators with an effective mass of $\sim 16 \,\mu g$ each. This is achieved by coupling them to a superconducting qubit which allows for engineering a two-mode squeezing interaction when parametrically driven. Crucially, we show that this interaction can be used to witness quantum correlations through the realization of a mechanical SU(1,1) interferometer. Our results expand the toolbox of operations in circuit quantum acoustodynamics and demonstrate that quantum correlations stronger than entanglement can also be observed in macroscopic systems, thereby shedding light on the boundary between quantum and classical regimes.

19.
medRxiv (Medicine) 2026-06-16

Optimal Clinical Trials Platform for Progressive Multiple Sclerosis (OCTOPUS): protocol for an international, multi-arm, multi-stage, platform, randomized controlled, double-blind, phase 3 clinical trial.

Introduction Current treatments for multiple sclerosis (MS) do not address the pathological processes of neurodegeneration and chronic demyelination. This, coupled with the significant challenges of translating promising phase 2 results to phase 3 trial success, highlights the need for more efficient trial designs, such as platform multi-arm multi-stage (MAMS) trial approaches. MAMS trials have demonstrated success in areas such as oncology and infectious diseases. They are typified by a statistically robust core trial design that allows the addition of further treatment arms and utilisation of interim outcome analyses at pre-defined timepoints, to determine whether to terminate a treatment arm early or proceed to the final outcome analysis. To address the challenges in progressive multiple sclerosis (PMS) treatment discovery, the Optimal Clinical Trials Platform for PMS (OCTOPUS) trial was developed. It currently utilises MRI whole-brain atrophy as its interim outcome measure and the clinically relevant composite Expanded Disability Status Scale Plus (EDSS-Plus) as its final outcome measure. A rigorous and systematic drug selection process that assessed preclinical in vitro and animal model evidence, along with additional human data, led to the prioritisation of R/S-alpha lipoic acid (R/S-ALA) and metformin for testing against placebo, targeting pathobiological mechanisms relevant to PMS. All participants will be eligible to receive the current standard of care, including disease-modifying treatments (DMTs). Method and analysis OCTOPUS will be a multi-centre, randomised, placebo-controlled, double-blind, phase 3, MAMS trial of participants aged 25 to 70 years (inclusive) with PMS and an EDSS score of 4.0 to 8.0 (inclusive). Steady progression must be the major cause of increasing disability rather than relapse in the preceding 2 years. In the trial s first candidate drug cycle, participants will be allocated to R/S-ALA, metformin, or placebo in a 1:1:1 ratio. Cycle 1 active treatments will start as R/S-ALA 600 mg once daily, increased after 4 weeks to 600 mg twice daily, or metformin 1 g once daily, increased after 4 weeks to 1 g twice daily. The trial will be multinational, with participation from 28 hospitals across the UK and 10 hospitals in Australia. Clinician-reported measures will include: the EDSS-Plus and the individual components: EDSS, Timed 25 Foot Walk (T25FW); 9 Hole Peg Test (9HPT); Symbol Digit Modalities Test (SDMT); Sloan Low Contrast Visual Acuity (SLCVA); and Relapse assessment. Patient-reported outcomes include MS specific walking, fatigue, pain, and impact scales. We will include a health economic analysis. Analysis stage 1 will require randomisation of 125 participants per arm and utilise MRI percentage brain volume change (PBVC) with the Structural Image Evaluation using Normalisation of Atrophy (SIENA) technique from baseline to 78 weeks. A positive outcome in analysis stage 1 will detect a 0.15% per year whole brain atrophy difference with a one-sided alpha of 0.35 and power of 95%, ensuring a low probability of erroneously rejecting a treatment arm at this stage. Any arms that show a positive effect will proceed to final analysis stage 2. Analysis stage 2 will require 600 participants per arm. Participants included in stage 1 will also be included in the stage 2. Analysis stage 2 will evaluate time to 6-month confirmed disability progression in the EDSS-Plus, in order to detect a 25% hazard ratio reduction with 90% power and an alpha of 0.05. Assuming one treatment arm proceeds to analysis stage 2, the trial will recruit approximately 1,200 participants and last about 6 years. This is approximately two-thirds the size and half the duration of separately conducted two-arm phase 2 and 3 trials. Ethics and dissemination The protocol was approved by the London Hampstead REC (22/LO/0622). This manuscript is based on protocol version 8.0, 28th August 2025. The findings of this trial will be disseminated through peer-reviewed publications and conference presentations. There will be a close communication strategy developed with the UK MS Society (MSS) and full patient and public involvement and engagement (PPIE). Trial registration ISRCTN: 14048364 EudraCT number: 2021-003034-37 CTA 20363/0445 IRAS number: 1003943 Secondary identifying numbers: ND001, CPMS 54274 Strengths and limitations - The OCTOPUS trial will be the first platform multi-arm multi-stage phase 3 trial in PMS, offering the potential to significantly expedite clinical trial processes with advantages in cost- and time-efficiency, focusing specifically on the poorly treated pathobiological processes of chronic neurodegeneration and demyelination - It will begin by assessing two promising drug candidates, immediate-release metformin and R/S-ALA, and will expand over the duration of the trial to include more drug arms under the same trial master protocol - The flexible and statistically robust trial design means that several components of the design (such as the early analysis stage 1 interim outcome) can be updated in line with evolving scientific knowledge - It will ultimately be the largest ever investigator-initiated phase 3 trial in PMS - It will include a range of national and international trial sites, including neuroscience centres and district general hospitals - It will have a high inclusion limit for age (up to 70 years) and disability (up to EDSS 8.0) - Several components (the telephone EDSS and virtual patient-reported outcome measures) will be amenable to remote collection increasing inclusivity and thus addressing public and participant suggestions, while minimising the risk of missing data - The main challenges in this trial design are the statistical and methodological complexity involved in design and implementation, and interpretation of interim trial results. Conclusion The trial launched cycle 1 in January 2023. Analysis stage 1 recruitment of 375 participants was achieved in November 2024, enabling planned interim analysis stage 1 to be conducted by late 2026 (Figure 1). On the 1st of June 2026, in the UK, 24 sites are active with a further 4 in set-up as part of stage 2, and in the Australian extension, Platform Adaptive Trial for Remyelination and Neuroprotection in Multiple Sclerosis (PLATYPUS), 1 site is active, with 9 additional sites in set-up.

20.
arXiv (CS.AI) 2026-06-11

DataEvolver: Automatic Data Preparation for Large Language Models through Multi-Level Self-Evolving

arXiv:2606.07001v2 Announce Type: replace-cross Abstract: High-quality training data is essential to large language models (LLMs) and typically requires extensive and costly manual curation. Existing automatic data preparation methods rely on predefined pipelines or customized human instructions, which limits their adaptability to diverse data distributions and lacks principled guidance from high-quality examples. In this paper, we introduce DataEvolver, the first self-evolving data preparation system that automatically constructs pipelines to transform raw data into high-quality data. DataEvolver employs a multi-level mechanism to ensure both pipeline executability and effectiveness. At the operator level, it incrementally expands the operator set to construct a logical plan while resolving dependency conflicts. At the pipeline level, it instantiates logical plans into executable code and iteratively refines pipeline orchestration through a feedback loop that reduces the distribution gap between prepared data and high-quality examples. Experiments on seven benchmarks show that DataEvolver substantially improves data quality and achieves an average 10\% gain in downstream LLM performance compared with training on original data, highlighting new opportunities for the iterative co-evolution of LLMs and data.

21.
arXiv (CS.AI) 2026-06-12

Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System

arXiv:2606.12702v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly integrated into clinical systems, making it essential to evaluate the real-world utility of these systems. However, static benchmarks tend to measure correctness rather than user acceptance, aggregate performance across queries, and require densely annotated datasets – leading to major blind spots for evaluating clinical systems. In this work, we perform a deployment-centered evaluation of an LLM system embedded within electronic health records at an academic medical center, where user feedback is sparse but closely reflects the deployment conditions. Specifically, we train a pre-response classifier that estimates the risk that a future interaction will result in the user rejecting the LLM response, based on query content and deployment-specific context available before generation. We conduct a prospective analysis of our model over 4.5 months of user feedback, finding that our prediction model achieves an AUROC of 0.719. Further, we estimate the benefit of such predictions in two downstream use cases (guardrail triggering and abstention). Our key conceptual insight is that making use of deployment-specific context (i.e., the provider type, department name, language model used for response), as opposed to only query content, improves the ability to predict whether the user will reject the system output. Altogether, our empirical case study demonstrates the feasibility of predicting user rejection using deployment-specific context, opening the door to targeted guardrails.

22.
arXiv (CS.CV) 2026-06-18

Characterizing Brazilian Atlantic Forest Restoration Outcomes with Geospatial AlphaEarth Embeddings

作者:

The Atlantic Forest in Brazil is a critical biodiversity hotspot, yet less than 12-15% of its original cover remains. Although monitoring forest restoration on a large scale is essential, traditional methods are limited by the impracticality of on-the-ground reporting on such a scale and by the saturation of remote-sensing indices such as NDVI. Furthermore, reforestation is a gradual process as opposed to the rapid spectral changes caused by deforestation. In this study, we examine 1,729 restoration sites in S\~ao Paulo, using satellite embeddings from the AlphaEarth Foundation's model to evaluate their effectiveness in characterising early restoration success. We introduce the concept of a 'Reference Trajectory Embedding', defining a metric of restoration success based on cosine similarity to reference sites of mature secondary forest. We observe distinct clusters in embedding space according to different land use and land cover (LULC) types, and we can identify sites with clear change vectors. However, the signal can be noisy, and embeddings may require further fine-tuning to capture and predict site metadata beyond LULC.

23.
medRxiv (Medicine) 2026-06-22

Exploring the association of Obesity on Cold and Warm Autoimmune Hemolytic Anemia in San Joaquin Valley: A Retrospective Cross-Sectional Study

The relationship between obesity and specific autoimmune diseases haas been well-established, specifically due to obesity's role in promoting pro-inflammatory states. Although not much literature has been documented regarding obesity association with AIHA. As such, this study aims to assess any correlations in patients with elevated body mass index (BMI) and autoimmune hemolytic anemia (AIHA). Here we present a retrospective cross-sectional study conducted over a four-year period, across four medical centers during which a new electronic medical record was implemented. The study included 25 patients who had a previously documented history of AIHA from another facility, DAT positive with indicators of hemolysis, or DAT positive with monomer specific antisera. The patients BMI was recorded at the time of presentation to the hospital. However, for patients with a prior history of AIHA or those transferred from another facility, the BMI that was closest to the time period of when the patient was diagnosed with AIHA was used as an adjunct. Our results show that there is an association of patients with elevated BMI (>25) and AIHA; however, various other confounding variables should be taken into consideration, and further research should be done to establish a causal relationship.

24.
PLOS Computational Biology 2026-06-16

Evolution and the ultimatum game: An agent-based model with interbirth intervals and population structure

by Jeffrey C. Schank, Matt L. Miller The ultimatum game (UG) is widely used to study mutually beneficial exchanges, fairness, and prosocial behavior across different societies. However, human behavior in UG experiments does not align with the game-theoretical prediction that proposers should offer the least positive amount and responders should accept such offers. Instead, proposers make generous offers that are greater than the minimum responders are willing to accept, resulting in generous offers with wide offer-acceptance gaps. Numerous evolutionary models of the UG have been created and studied to explain human behavior, particularly generous offers made in UG experiments. These models have recently faced criticism for lacking biological realism and not adequately explaining the data. Here, we present an agent-based model inspired by our hunter-gatherer ancestors and with a biologically more realistic selection process. We assume that (1) agents exist in group-structured and group-clustered populations, where reproduction (2) depends on resource accumulation, but (3) is limited by interbirth intervals. We ran simulations to assess whether this biologically more realistic model evolves patterns of behavior consistent with patterns in the data from meta-analyses of human behavior in the UG. For the proposed model, we show that generous offers robustly evolve, as well as the difficult-to-explain offer-acceptance gaps, only in group-structured populations with interbirth intervals. We demonstrate that these results are robust and may help explain variation in data across societies. We discuss how interbirth intervals interact with group structure to modulate offer and rejection costs, favoring the evolution of generous offers, offer-acceptance gaps, and other patterns in the data on human behavior in the UG. We also discuss why weak selection and/or high mutation rate models cannot explain all the patterns in UG experimental data. We discuss biological realism and conclude that group structure and interbirth intervals may be essential for explaining prosocial behavior across societies.

25.
arXiv (quant-ph) 2026-06-15

Tantalum as a base material for superconducting integrated circuits

arXiv:2606.13750v1 Announce Type: new Abstract: The performance of superconducting integrated circuits for quantum applications is fundamentally limited by material-related losses. Tantalum, as an emerging material for next-generation quantum circuits, has attracted considerable attention in recent years after demonstrating breakthrough performance in both superconducting microwave resonators and qubits. Concurrently, a growing body of work is devoted to the operation of tantalum-based circuits and related fabrication techniques. This interest is further stimulated by tantalum thin films polymorphism resulting in a variety of its crystalline structure, superconducting properties, coherence, etc. Furthermore, tantalum circuits exhibit distinctive features in cryogenic experiments, which have not been observed in aluminum- or niobium-based ones. In this review, we summarize the recent research of tantalum thin films growth and phase selection mechanisms on various substrates, key aspects of fabrication and performance of superconducting circuit, including a material first-principles theoretical study. In conclusion, we address a number of open issues, including the role of \b{eta}-phase impurities, the effect of hydrofluoric acid solutions on chain characteristics, and the anomalous behavior of {\alpha}-tantalum chains at cryogenic temperatures.