Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

Benchmarking Cross-Domain Audio-Visual Deception Detection

Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. We also propose an algorithm to enhance the generalization performance by maximizing the gradient inner products between modality encoders, named ``MM-IDGM". Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection.

02.
PLOS Medicine 2026-05-27

Sequential chemo-immunotherapy followed by standard versus reduced thoracic radiotherapy for older and/or frail stage III non-small-cell lung cancer: A randomized open-label cohort trial

作者:

by Wei-Xiang Qi, Shuyan Li, Mengdi Wang, Huan Li, Feifei Xu, Lei Yao, Biao Yu, Linlin Chen, Gang Cai, Cheng Xu, Xianwen Sun, Zhiyao Bao, Jiayi Chen, Yi Xiang, Shengguang Zhao Background The appropriateness of concurrent chemoradiotherapy (cCRT) for older or clinically vulnerable stage III unresectable non-small-cell lung cancer (NSCLC) patients remains contentious. Furthermore, the survival implications of de-escalating thoracic radiotherapy (RT) intensity in this population have not been conclusively elucidated. Methods and findings We conducted a phase II randomized, open-label, two-cohort (non-comparative) trial at a tertiary hospital in China (NCT05557552). Between September 30, 2022 and April 30, 2024, we enrolled 56 older and/or frail patients with stage III NSCLC who were ineligible for cCRT. The primary endpoint was the 1-year progression-free survival (PFS) rate estimated using the Kaplan–Meier method. Secondary endpoints included objective response rate (ORR), overall survival (OS), and safety. In the intention-to-treat (ITT) set, which included all 56 randomized patients who received at least one dose of study treatment, the 1-year PFS was 84.3% (95% confidence interval [CI] [70.3%, 98.3%]) in the standard RT group and 70.7% (95% CI [54.3%, 87.1%]) in the reduced RT group. In the per-protocol set (53 patients), the 1-year PFS was 82.9% (95% CI [68.9%, 98.8%]) in the standard RT group and 73.4% (95% CI [58.3%, 92.4%]), with a median follow-up of 24 months. Among 56 patients in the safety analysis set, 71.4% of patients experienced grade 3/4 adverse events (AEs) in the standard RT group and 53.6% in the reduced RT group. One patient (3.6%) in the reduced RT and three patients (10.7%) in the standardized RT experienced grade 5 AEs. The main limitations are the non-comparative design, small sample size, and lack of power to establish non-inferiority or superiority. Conclusion The current study suggested that reduced RT combined with sequential chemo-immunotherapy might be feasible for older/frail patients intolerant to cCRT, showing numerically similar survival outcomes. These exploratory findings warrant confirmation in larger, adequately powered randomized trials. Trial registration The trial had been registered on ClinicalTrials.gov on Sep 30, 2022.ClinicalTrials.gov NCT05557552

03.
arXiv (CS.LG) 2026-06-17

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

arXiv:2507.20708v3 Announce Type: replace Abstract: The rapid deployment of AI systems in high-stakes domains, including those classified as high-risk under the The EU AI Act (Regulation (EU) 2024/1689), has intensified the need for reliable compliance auditing. For binary classifiers, regulatory risk assessment often relies on global fairness metrics such as the Disparate Impact ratio, widely used to evaluate potential discrimination. In typical auditing settings, the auditee provides a subset of its dataset to an auditor, while a supervisory authority may verify whether this subset is representative of the full underlying distribution. In this work, we investigate to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness. We formalize this problem as a constrained distributional projection task and introduce mathematically grounded manipulation strategies based on entropic and optimal transport projections. These constructions characterize the minimal distributional shift required to satisfy fairness constraints. To counter such attacks, we formalize representativeness through distributional distance based statistical tests and systematically evaluate their ability to detect manipulated samples. Our analysis highlights the conditions under which fairness manipulation can remain statistically undetected and provides practical guidelines for strengthening supervisory verification. We validate our theoretical findings through experiments on standard tabular datasets for bias detection. Code is publicly available at https://github.com/ValentinLafargue/Inspection.

04.
arXiv (CS.LG) 2026-06-18

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

arXiv:2605.07022v3 Announce Type: replace Abstract: Manually curated biomedical repositories – spanning bioactivity, genomics, and chemistry – are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based entity-tagging pipeline, grounded in nine biomedical ontologies, that tags 4.5B entities across 19 categories in a 22.5M-paper, 2.5T-token PubMed corpus; (2) hybrid sparse-dense retrieval supporting entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages. Across six tasks – blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene-disease associations, protein subcellular localization, and chemical reactions – Starling produces ~6.3M records (91K-3M per task); several are, to our knowledge, the largest public datasets for their property. Frontier-model rejection of our extractions is 0.6-7.7% across tasks, far below error rates we measure on widely used curated counterparts (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma). Beyond scale and accuracy, the supporting passages carry nuance tabular databases discard – e.g., oral bioavailability may depend on fed vs. fasted state. Together, the corpus, retrieval, and agent establish a foundation for AI-driven therapeutic design. Code and datasets: https://github.com/starling-labs/starling.

05.
arXiv (quant-ph) 2026-06-17

Fabless Quantum Chip Design and Commercial Production

arXiv:2606.17956v1 Announce Type: new Abstract: This paper proposes a fabless quantum-chip design and production architecture for superconducting quantum computing, centered on the SPICE-Q multiphysics simulation framework. The proposed ecosystem connects process-certified quantum PDKs, parameterized device cells, traceable model cards, SPICE-Q physical modeling languages, unified Q-EDA flows, foundry sign-off rules, cryogenic test feedback, and reusable quantum IP. In this model, design firms do not merely outsource fabrication; they prepare verified tape-outs under standardized process constraints and calibrated physical models. Its economic value lies in reducing repetitive device debugging, process exploration, and low-level layout effort, while its feasibility depends on PDK maturity, foundry yield, cryogenic test throughput, model-prediction accuracy, data-feedback mechanisms, and IP licensing boundaries. We argue that superconducting quantum chips can move from the current largely vertically integrated development model toward a fabless-foundry ecosystem only when hardware design is supported by standardized, verifiable, and reusable software and process interfaces. The required pillars are certified PDKs, PCell-based parameterized design, SPICE-Q cross-physics simulation, end-to-end Q-EDA automation, and a tradable quantum-IP market. By adapting lessons from the classical semiconductor industry to quantum hardware, this framework defines a path toward scalable, manufacturable, and commercially reusable superconducting quantum-chip design.

06.
arXiv (CS.AI) 2026-06-11

Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning

arXiv:2605.12655v3 Announce Type: replace Abstract: Multi-agent reinforcement learning (MARL) in real-world use cases may need to adapt to external natural language instructions that interrupt ongoing behavior and conflict with long-horizon objectives. However, conditioning rewards on instructions introduces a fundamental failure mode as Bellman updates couple value estimates across instruction contexts, leading to inconsistent values when instructions interrupt macro-actions. We propose Macro-Action Value Correction for Instruction Compliance (MAVIC), which corrects Bellman backups at instruction boundaries by correcting the incoming instruction objective and restoring the continuation value under the current objective. Unlike reward shaping, MAVIC modifies the bootstrapping target itself, enabling consistent value estimation under stochastic instruction switching within a unified policy. We provide theoretical analysis and an actor-critic implementation, and show that MAVIC achieves high instruction compliance while preserving base task performance in increasingly complex cooperative multi-agent environments.

07.
arXiv (CS.LG) 2026-06-16

p-PSO: A Penalized Particle Swarm Optimization Technique for Finding D-Optimal Designs with Mixed Factors in Generalized Linear Models

arXiv:2606.15962v1 Announce Type: cross Abstract: Finding D-optimal designs for generalized linear models (GLMs) is challenging due to the dependence of the Fisher information matrix on unknown parameters and the lack of closed-form solutions, particularly when input factors include both discrete and continuous variables. Although classical algorithms and recent metaheuristic approaches have offered partial solutions, there remains a need for robust and computationally efficient methods. In this paper, we propose a penalized Particle Swarm Optimization (PSO) approach, named $p$-PSO. Here we introduce a new, general-purpose penalty formulation for constrained optimization and demonstrate its effectiveness in optimal design problems. The formulation is algorithm-agnostic and applicable to a broad class of black-box optimization methods. Results show that the method is highly efficient, with its primary contribution being a penalty formulation that enables the direct use of an off-the-shelf PSO algorithm and extends naturally to more general constrained optimization tasks.

08.
medRxiv (Medicine) 2026-06-18

Excess mortality in Germany during 2020-2023: A descriptive age-stratified analysis

作者:

This study investigates excess mortality in Germany in the years from 2020 to 2023 and its temporal alignment with reported COVID-19 deaths. The analysis uses annual and weekly all-cause mortality data and linear baseline trends derived from pre-pandemic years. Possible effects of demographic and population changes on baseline trends were also examined. Excess mortality was analysed over time and across age groups. Excess mortality was observed in all investigated years, rising from 2020 to its highest value in 2022. In absolute terms, the age group [≥]80 years accounted for the largest proportion of excess deaths throughout the study period. After 2021, elevated mortality relative to baseline was also observed in younger age groups down to 15 years of age, although absolute numbers remained substantially lower than in older groups. No evidence of excess mortality was observed for individuals younger than 15 years. Periods of excess mortality were temporally aligned with waves of reported COVID-19 deaths. In 2020, cumulative excess mortality after calendar week 11 closely matched reported COVID-19 deaths (43 876 vs. 41 835 deaths). Weekly excess mortality, reported COVID-19 deaths and wastewater viral load, when available showed strong temporal synchrony, although excess mortality increasingly exceeded reported COVID-19 deaths during later pandemic waves. Temporal patterns differed from the typical seasonal mortality peaks commonly associated with influenza epidemics during the early months of the year. In 2023, excess mortality declined substantially, possibly indicating a return to mortality levels before the emergence of SARS-CoV-2.

09.
arXiv (CS.CL) 2026-06-19

Reliability without Validity: A Systematic, Large-Scale Evaluation of LLM-as-a-Judge Models Across Agreement, Consistency, and Bias

LLM-as-a-Judge has become the dominant evaluation paradigm for language models, but judge validation in practice relies on exact-match agreement, a metric that does not correct for chance and systematically overstates discriminative ability. We present the largest systematic evaluation of LLM-as-a-Judge to date: 21 judges from nine providers across MT-Bench, JudgeBench, and RewardBench, evaluated under three protocols (agreement, consistency, bias audit) over 118 runs and approximately 541,000 individual judgments. Four findings emerge, consistent across the full cohort, including the April 2026 frontier: kappa deflation between exact match and Cohen's kappa is universal (33–41 pp on MT-Bench), judge rankings shift by up to 14 positions across benchmarks, high test–retest reliability (>0.95) coexists with severe position bias (>0.10) in two production-deployed judges (instantiating a consistency–bias paradox), and verbosity bias is small (

10.
arXiv (CS.CV) 2026-06-11

SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation

The performance of machine learning models depends heavily on training data. The scarcity of large-scale, well-annotated datasets poses significant challenges in creating robust models. To address this, synthetic data generated through simulations and generative models has emerged as a promising solution, enhancing dataset diversity and improving the performance, reliability, and resilience of models. However, evaluating the quality of this generated data requires an effective metric. We introduce the Synthetic Dataset Quality Metric (SDQM) to assess data quality for object detection tasks without requiring model training to converge. This metric enables more efficient generation and selection of synthetic datasets, addressing a key challenge in resource-constrained object detection tasks. In our experiments, SDQM demonstrated a strong correlation with the mean average precision (mAP) scores of YOLO11, a leading object detection model, whereas previous metrics only exhibited moderate or weak correlations. In addition, it provides actionable insights into improving dataset quality, minimizing the need for costly iterative training. This scalable and efficient metric sets a new standard for evaluating synthetic data. The code for SDQM is available at https://github.com/ayushzenith/SDQM

11.
arXiv (CS.CL) 2026-06-12

Detecting Functional Memorization in Code Language Models

Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equivalent while textually dissimilar. In this work, we study functional memorization: extraction of functional logic beyond what verbatim metrics detect. We construct a counterfactual setup for Olmo-3-32B, comparing a midtrained model (exposed to target code) against a pretrained reference (not exposed). We prompt both models with Python function signatures and measure both textual and functional similarity (i.e., LLM-as-a-judge, execution-based). Our results show clear evidence of functional memorization, highlighting the need for auditing metrics that go beyond textual overlap.

12.
arXiv (quant-ph) 2026-06-16

Simulation of Non-Hermitian Hamiltonians with Bivariate Quantum Signal Processing

arXiv:2605.12450v2 Announce Type: replace Abstract: We achieve query-optimal quantum simulations of non-Hermitian Hamiltonians $H_{\mathrm{eff}} = H_R + iH_I$, where $H_R$ is Hermitian and $H_I \succeq 0$, using a bivariate extension of quantum signal processing (QSP) with non-commuting signal operators. The algorithm encodes the interaction-picture Dyson series as a polynomial on the bitorus, implemented through a structured multivariable QSP (M-QSP) circuit. A constant-ratio condition guarantees scalar angle-finding for M-QSP circuits with arbitrary non-commuting signal operators. A degree-preserving sum-of-squares spectral factorization permits scalar complementary polynomials in two variables. Angles are deterministically calculated in a classical precomputation step, running in $\mathcal{O}(d_R \cdot d_I)$ classical operations. Operator norms $\alpha_R\,,\beta_I$ contribute additively with query complexity $\mathcal{O}((\alpha_R + \beta_I)T + \log(1/\varepsilon)/\log\log(1/\varepsilon))$ matching an information-theoretic lower bound in the separate-oracle model, where $H_R$ and $H_I$ are accessed through independent block encodings. The postselection success probability is $e^{-2\beta_I T}\|e^{-iH_{\mathrm{eff}}T}|\psi_0\rangle\|^2\cdot (1 - \mathcal{O}(\varepsilon))$, decomposing into a state-dependent factor $\|e^{-iH_{\mathrm{eff}}T}|\psi_0\rangle\|^2$ from the intrinsic barrier and an $e^{-2\beta_I T}$ overhead from polynomial block-encoding.

13.
arXiv (CS.AI) 2026-06-12

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

arXiv:2606.12422v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades, generative AI (GenAI) now enables educators to implement standards-based grading (SBG) with unprecedented efficiency and scale. This paper examines the theoretical foundations and evaluates an LLM grader that uses commercially available foundation models with context and prompt engineering to score student work against a rubric. Drawing on an empirical interrater agreement study using Massachusetts Comprehensive Assessment System (MCAS) data, we observed the Quadratic Weighted Kappa (QWK) and Proportional Reduction in Mean-Squared Error (PRMSE) across mathematics, science, and ELA, using Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini. The results demonstrate that LLM graders, especially when based on foundational models with more parameters, achieve substantial agreement with human raters in mathematics and science assessments, while the performances vary in ELA, suggesting generic foundation models can be effective at scoring in given contexts. Additional analysis of teacher and student feedback reveals strong acceptance of AI-generated narrative feedback but skepticism toward numerical scores, suggesting that LLMs function most effectively as formative tools rather than summative evaluators. Our findings indicate that thoughtfully designed hybrid models that combine AI efficiency with teacher judgment can reduce workload, enhance feedback quality, and support equitable assessment practices without displacing professional expertise.

14.
arXiv (CS.AI) 2026-06-17

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

arXiv:2606.18206v1 Announce Type: new Abstract: Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The number of effective layers reached by looping determines the quality of the solution these models find. Like deep architectures, looped architectures are prone to a signal propagation problem induced by depth as the halting decision is postponed. In this paper, we address this signal propagation issue using pre-norm layers and residual scaling. Building on these architectural modifications, we propose FPRM, a Transformer-based Fixed-Point Reasoning Model that uses fixed-point convergence as an end-to-end halting mechanism in a looped architecture. We show that fixed-point halting allows FPRM to adapt its compute to task difficulty. FPRM is effective on common reasoning benchmarks, namely Sudoku, Maze, state-tracking, and ARC-AGI.

15.
arXiv (CS.LG) 2026-06-15

Machine-learned particle flow as a foundation model for collider physics

arXiv:2606.14373v1 Announce Type: cross Abstract: The workflow from particle collision to physics analysis passes through a series of reconstruction steps that are traditionally modular and disconnected, with no shared representation linking low-level detector data to high-level analysis tasks. We show that casting event reconstruction as a machine learning problem naturally produces such a shared representation. We repurpose a machine learning model trained for particle-flow reconstruction (MLPF) to perform three distinct analysis tasks: jet flavor identification, jet energy regression, and missing momentum regression. By appending the per-particle latent representations learned during reconstruction as additional input features, we substantially improve over baselines that use kinematic features alone. We further demonstrate that a single linear layer trained using only the latent representations achieves competitive performance against state-of-the-art baseline architectures, and outperforms the baseline for missing momentum regression with approximately 35 times fewer parameters. These results demonstrate that the latent representations learned during reconstruction encode essential physics information needed for downstream analysis, establishing MLPF as a foundation model and offering a concrete step toward an end-to-end pipeline from detector data to physics analysis.

16.
arXiv (CS.CL) 2026-06-12

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated task conditions. To address this gap, we introduce EvoArena, a benchmark suite that models environment changes as sequences of progressive updates across terminal, software, and social domains. We further propose EvoMem, a patch-based memory paradigm that records memory evolution as structured update histories, enabling agents to reason about environmental evolution through changes in their memory. Experiments show that current agents struggle on EvoArena, achieving an average accuracy of 39.6% across evolving terminal, software, and social-preference domains. EvoMem consistently improves performance, yielding an average gain of 1.5% on EvoArena and also improving standard benchmarks such as GAIA and LoCoMo by 6.1% and 4.8%. Beyond individual tasks, EvoMem further improves chain-level accuracy by 3.7% on EvoArena, where success requires completing a consecutive sequence of related evolutionary subtasks. Mechanistic analysis shows that EvoMem improves evidence capture in the memory, indicating better preservation of complete evolving environment states. Our results highlight the importance of modeling evolution in both evaluation and memory for reliable agent deployment.

17.
arXiv (quant-ph) 2026-06-16

Scalable generation of heralded single photons via active feed-forward switching of a fiber delay line

arXiv:2606.16741v1 Announce Type: new Abstract: Quasi-deterministic single-photon generation is a key requirement for many photonic quantum technologies. Photon sources based on spontaneous parametric down-conversion (SPDC) are widely used for producing high-quality photons; however, the probabilistic nature of the process limits the generation of synchronized multi-photon states. Here, we demonstrate temporal synchronization of multiple photon-generation events using a free-space-fiber hybrid delay line with feed-forward control, enabling fast and efficient switching and scalable operation. Narrow-band, telecom-wavelength photons compatible for fiber transmission are heralded from a monolithic cavity SPDC source and synchronized across 20 time bins. This yields a sixfold enhancement in synchronized rates and enables multi-photon synchronization, with only a marginal increase of higher-order photon-number contributions.

18.
medRxiv (Medicine) 2026-06-16

Optimal Clinical Trials Platform for Progressive Multiple Sclerosis (OCTOPUS): protocol for an international, multi-arm, multi-stage, platform, randomized controlled, double-blind, phase 3 clinical trial.

Introduction Current treatments for multiple sclerosis (MS) do not address the pathological processes of neurodegeneration and chronic demyelination. This, coupled with the significant challenges of translating promising phase 2 results to phase 3 trial success, highlights the need for more efficient trial designs, such as platform multi-arm multi-stage (MAMS) trial approaches. MAMS trials have demonstrated success in areas such as oncology and infectious diseases. They are typified by a statistically robust core trial design that allows the addition of further treatment arms and utilisation of interim outcome analyses at pre-defined timepoints, to determine whether to terminate a treatment arm early or proceed to the final outcome analysis. To address the challenges in progressive multiple sclerosis (PMS) treatment discovery, the Optimal Clinical Trials Platform for PMS (OCTOPUS) trial was developed. It currently utilises MRI whole-brain atrophy as its interim outcome measure and the clinically relevant composite Expanded Disability Status Scale Plus (EDSS-Plus) as its final outcome measure. A rigorous and systematic drug selection process that assessed preclinical in vitro and animal model evidence, along with additional human data, led to the prioritisation of R/S-alpha lipoic acid (R/S-ALA) and metformin for testing against placebo, targeting pathobiological mechanisms relevant to PMS. All participants will be eligible to receive the current standard of care, including disease-modifying treatments (DMTs). Method and analysis OCTOPUS will be a multi-centre, randomised, placebo-controlled, double-blind, phase 3, MAMS trial of participants aged 25 to 70 years (inclusive) with PMS and an EDSS score of 4.0 to 8.0 (inclusive). Steady progression must be the major cause of increasing disability rather than relapse in the preceding 2 years. In the trial s first candidate drug cycle, participants will be allocated to R/S-ALA, metformin, or placebo in a 1:1:1 ratio. Cycle 1 active treatments will start as R/S-ALA 600 mg once daily, increased after 4 weeks to 600 mg twice daily, or metformin 1 g once daily, increased after 4 weeks to 1 g twice daily. The trial will be multinational, with participation from 28 hospitals across the UK and 10 hospitals in Australia. Clinician-reported measures will include: the EDSS-Plus and the individual components: EDSS, Timed 25 Foot Walk (T25FW); 9 Hole Peg Test (9HPT); Symbol Digit Modalities Test (SDMT); Sloan Low Contrast Visual Acuity (SLCVA); and Relapse assessment. Patient-reported outcomes include MS specific walking, fatigue, pain, and impact scales. We will include a health economic analysis. Analysis stage 1 will require randomisation of 125 participants per arm and utilise MRI percentage brain volume change (PBVC) with the Structural Image Evaluation using Normalisation of Atrophy (SIENA) technique from baseline to 78 weeks. A positive outcome in analysis stage 1 will detect a 0.15% per year whole brain atrophy difference with a one-sided alpha of 0.35 and power of 95%, ensuring a low probability of erroneously rejecting a treatment arm at this stage. Any arms that show a positive effect will proceed to final analysis stage 2. Analysis stage 2 will require 600 participants per arm. Participants included in stage 1 will also be included in the stage 2. Analysis stage 2 will evaluate time to 6-month confirmed disability progression in the EDSS-Plus, in order to detect a 25% hazard ratio reduction with 90% power and an alpha of 0.05. Assuming one treatment arm proceeds to analysis stage 2, the trial will recruit approximately 1,200 participants and last about 6 years. This is approximately two-thirds the size and half the duration of separately conducted two-arm phase 2 and 3 trials. Ethics and dissemination The protocol was approved by the London Hampstead REC (22/LO/0622). This manuscript is based on protocol version 8.0, 28th August 2025. The findings of this trial will be disseminated through peer-reviewed publications and conference presentations. There will be a close communication strategy developed with the UK MS Society (MSS) and full patient and public involvement and engagement (PPIE). Trial registration ISRCTN: 14048364 EudraCT number: 2021-003034-37 CTA 20363/0445 IRAS number: 1003943 Secondary identifying numbers: ND001, CPMS 54274 Strengths and limitations - The OCTOPUS trial will be the first platform multi-arm multi-stage phase 3 trial in PMS, offering the potential to significantly expedite clinical trial processes with advantages in cost- and time-efficiency, focusing specifically on the poorly treated pathobiological processes of chronic neurodegeneration and demyelination - It will begin by assessing two promising drug candidates, immediate-release metformin and R/S-ALA, and will expand over the duration of the trial to include more drug arms under the same trial master protocol - The flexible and statistically robust trial design means that several components of the design (such as the early analysis stage 1 interim outcome) can be updated in line with evolving scientific knowledge - It will ultimately be the largest ever investigator-initiated phase 3 trial in PMS - It will include a range of national and international trial sites, including neuroscience centres and district general hospitals - It will have a high inclusion limit for age (up to 70 years) and disability (up to EDSS 8.0) - Several components (the telephone EDSS and virtual patient-reported outcome measures) will be amenable to remote collection increasing inclusivity and thus addressing public and participant suggestions, while minimising the risk of missing data - The main challenges in this trial design are the statistical and methodological complexity involved in design and implementation, and interpretation of interim trial results. Conclusion The trial launched cycle 1 in January 2023. Analysis stage 1 recruitment of 375 participants was achieved in November 2024, enabling planned interim analysis stage 1 to be conducted by late 2026 (Figure 1). On the 1st of June 2026, in the UK, 24 sites are active with a further 4 in set-up as part of stage 2, and in the Australian extension, Platform Adaptive Trial for Remyelination and Neuroprotection in Multiple Sclerosis (PLATYPUS), 1 site is active, with 9 additional sites in set-up.

19.
arXiv (CS.CL) 2026-06-17

From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent Communities

While parasocial interactions (PSIs) and parasocial relationships (PSRs) have been studied in conventional media settings, we investigate whether PSI- (colloquial) relational cues also exist in online communities where both sides are autonomous AI agents. We analyze 4,434 posts and 50,338 comments from Moltbook through three theory-based textual indicators: attachment/intimacy language, reciprocity bids, and self-identification to original poster (OP). The combined results across methods based on keyword matching, few-shot large language model (LLM) annotation, and grouped-context LLM annotation reveal that PSI colloquial cues prevail and are strongly associated with OP re-engagement and a reciprocal reply structure. These results are robust across negative controls, nullification, clustered-standard-error re-estimation, and multiple-testing correction. A dyadic persistence test further affirms reciprocity bids aligned with sustained OP-involving mutual recurrence, providing empirical evidence for bridging interaction-level PSI scripts with PSR-consistent repeated dyadic patterns. We interpret the evidence as a behavioral structure in discourse by LLM-enabled agents.

20.
medRxiv (Medicine) 2026-06-22

T Cell Receptor repertoire analysis reveals antigenic convergence and immunotherapeutic opportunities in Prostate Cancer

Background: The T-cell receptor {beta} (TCR{beta}) repertoire reflects antigen-driven adaptive immune responses and provides insight into tumor-immune interaction. In prostate cancer (PCa), the immunosuppressive tumor microenvironment limits effective T-cell activation, and the antigenic drivers shaping intratumoral TCR repertoires remains poorly defined. This study aimed to characterize matched tumor and peripheral TCR{beta} repertoires from treatment-naive PCa patients and to identify shared clonotypes and antigenic specificities associated with disease severity. Methods: Next-generation sequencing was used to profile TCR{beta} repertoires from matched tumor biopsies and peripheral blood mononuclear cells obtained from treatment-naive PCa patients. Repertoires clonality, diversity, and was assessed using established metrics. Antigenic convergence was evaluated using GLIPH2 to identify shared CDR3{beta} motifs and predicted tumor-associated antigen (TAA) recognition, followed by functional validation using IFN-{gamma} ELISpot and T-cell expansion assays. Results: Tumor-derived TCR{beta} repertoires displayed reduced richness and increased clonality compared with peripheral blood mononuclear cells, consistent with local antigen-driven expansion. High-grade tumors demonstrated greater interpatient clonotype sharing and motif-level convergence, indicative of recognition of common TAAs. GLIPH2 analysis associated expanded clonotypes with epitopes derived from prostate-specific G-protein coupled receptor (PSGR), prostate-specific membrane antigen (PSMA), and prostate-specific antigen (PSA). Functional validation confirmed that peptide pools containing PSGR- and PSMA-derived epitopes induced IFN-{gamma} production and antigen-specific T-cell proliferation in vitro. Conclusions: These findings reveal an oligoclonal, antigen-driven intratumoral TCR{beta} landscape and identify PSGR and PSMA as immunogenic, potentially actionable targets. Integration of TCR profiling with antigen discovery pipelines may support the development of TCR-based biomarkers and precision immunotherapeutic strategies in prostate cancer.

21.
arXiv (CS.LG) 2026-06-17

Geodesic Calculus on Implicitly Defined Latent Manifolds

arXiv:2510.09468v3 Announce Type: replace Abstract: Latent manifolds of autoencoders provide low-dimensional representations of data, which can be studied from a geometric perspective. We propose to describe these latent manifolds as implicit submanifolds of some ambient latent space. Based on this, we develop tools for a discrete Riemannian calculus approximating classical geometric operators. These tools are robust against inaccuracies of the implicit representation often occurring in practical examples. To obtain a suitable implicit representation, we propose to learn an approximate projection onto the latent manifold by minimizing a denoising objective. This approach is independent of the underlying autoencoder and supports the use of different Riemannian geometries on the latent manifolds. The framework in particular enables the computation of geodesic paths connecting given end points and shooting geodesics via the Riemannian exponential maps on latent manifolds. We evaluate our approach on various autoencoders trained on synthetic and real data.

22.
arXiv (CS.CL) 2026-06-12

Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents

Long-horizon tool-use reinforcement learning can learn from outcome verification, but its trajectory-level advantage is broadcast across many reasoning, API, and answer tokens. Self-distillation promises a denser signal by reusing a policy's own rollouts or a privileged teacher. We show, however, that direct token-level self-distillation can silently destroy tool use: it rehearses teacher behavior without knowing which actions the verifier rewards, so useful skills and harmful shortcuts are amplified together. We introduce Sibling-Guided Credit Distillation (SGCD), which uses distillation for credit assignment rather than as a competing actor loss. Dynamic sampling produces mixed successful and failed sibling rollouts; an external LLM summarizes their contrast into a training-only stepwise credit reference; dense teacher/student divergence drives credit reassignment; and bounded detached credit weights reshape GRPO token advantages. The deployed student sees no external LLM, sibling evidence, or oracle. Across AppWorld and $\tau^3$-airline, SGCD improves over matched GRPO comparators: AppWorld TGC $42.9 \to 45.6$ on test_normal and $24.7 \to 27.0$ on test_challenge, and $\tau^3$-airline pass@1 $0.583 \to 0.602$.

23.
arXiv (CS.AI) 2026-06-12

Fantastic Scientific Agents and How to Build Them: AgentBuild for Rietveld Refinement

arXiv:2606.12834v1 Announce Type: new Abstract: As scientific workflows shift from deterministic executables to LLM-based agents, the development practices on offer, such as fine-tuning, reinforcement learning, and prompt-and-go, bury the scientist's judgment. We propose treating agent construction as a workflow stage and introduce AgentBuild, which builds a scientific agent from a contract the scientist authors. The contract is a version-controlled rubric, a difficulty-graded curriculum, and a curated external knowledge base. A rubric-driven judge gates a meta-optimizer coding agent that edits the agent within a declared boundary, so the build compiles the agent, not the scientist's judgment. We instantiate this for Rietveld refinement of X-ray diffraction data through GSAS-II behind MCP and A2A, where a blank-harness construction run progresses through a lithium lanthanum zirconium oxide (LLZO) signal-to-noise ladder, reaches the 4 hour scan as a frontier case, and exposes the workflow-scope limits that remain. The same rubric that rewards credible fits also scores trajectory scope, making the frontier a contract failure rather than a pattern-fitting failure. As base models evolve, re-running AgentBuild is a re-tune, not a rebuild, and the scientist's authored contract remains the durable asset.

24.
arXiv (CS.CV) 2026-06-12

Context-Aware Feature-Fusion for Co-occurring Object Detection in Autonomous Driving

Object detection in autonomous driving requires precise localization and an inherent understanding of the relational context between co-occurring objects. In extremely complex heterogeneous environments rare classes, small-scale objects, and frequently appearing objects are difficult for standard object detection frameworks to handle. In this paper, we propose a novel framework called Context-Centric Feature Fusion (CCFF), which utilizes two attention-based modules, Local Context Fusion Module (LCFM) uses the RoI-to-RoI self-attention mechanism to resolve spatial interactions, mainly considering small and partially obscured objects, while Global Context Attention Module (GCAM) converts the co-occurrence of objects priors by pooling top-K RoI features into a global context attention token, avoiding the computational overhead of pixel-level global pooling. This fusion of local and object-centric global features yields contextualized embeddings that enhance classification results and co-occurring objects detection. Our method is evaluated on two datasets, Cityscapes and BDD100K which demonstrate significant improvement on relational consistency, achieving a Category-level Consistency Strategy (CCS) of 0.973 and 0.969, respectively. Furthermore, our approach produces substantial gains in small object detection (AP_S: 14.1%) and successfully recovers rare classes such as "Train" that are typically lost in large distributions. Our efficiency report shows that the framework processes images in real time with a 0.2 FPS overhead. The code is available at https://github.com/BinayKSingh/CCFF.

25.
medRxiv (Medicine) 2026-06-22

MinderCare: protocol for a mixed-methods evaluation of a digitally enabled dementia care service.

Introduction and aims Dementia is a growing public health challenge affecting millions of people worldwide. It is a progressive condition that increases the risk of infections, falls, hospital admissions, dependence in activities of daily living, safety issues such as wandering, care home transfers, and death. New ways of supporting people living with dementia (PLWD) at home are urgently needed. We describe the MinderCare study which evaluates a digitally enabled care model that integrates low-burden sensor-based remote monitoring within a nurse-led clinical service. Methods and analysis In this mixed-methods study, we will recruit 100 people with confirmed or suspected dementia living at home and deploy the Minder remote monitoring system for at least 12 months. A detailed characterisation of the cohort will be obtained, including cognition, frailty, participant and carer wellbeing, functioning, and quality of life. The feasibility, acceptability, sustainability, and resource requirements of the service will also be assessed. Low-cost sensors provide information about behaviour, environment and physiology from the home. Machine-learning algorithms have been used to develop digital biomarkers of infection, sleep, night-time behaviours, daily activities and routines, and the effects of clinical events and treatment. These will be assessed through clinical reports of sensor-derived data that include anomaly alerts provided to the clinical teams. Algorithms will be assessed for their clinical utility and acceptability. The comparative-effectiveness component will be designed as a target trial emulation using linked electronic health-record data to construct a time-indexed external usual-care control cohort. The primary comparative outcome will be Days Alive and Out of Hospital (DAOH) over 12 months from the activation-index date, with healthcare utilisation, costs, institutionalisation and mortality assessed as secondary outcomes. DAOH and estimated MinderCare effects will also be examined across prespecified strata of baseline inpatient utilisation. Ethics and dissemination Ethical approval has been granted by the North East Newcastle and North Tyneside 2 Research Ethics Committee, and the study has received confirmation of capacity and capability by the Imperial College Healthcare NHS Trust. Study findings will be disseminated to patients, health and social care professionals, and policymakers through peer-reviewed publications and conference presentations. Study registration number: ISRCTN14997677 and NIHR portfolio CPMSID 63023.