Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-18

Automated Airways Characterization and Assessment of Cystic Fibrosis from CT Imaging

Background Advancements in medical imaging have enabled non-invasive diagnosis and staging of cystic fibrosis (CF) using CT scans, revealing dilated airways, an increased number of visible airways, and airway generation splits in these patients. However, manual characterization of airways remains time-consuming and challenging due to the numerous structural changes, thereby limiting clinical feasibility. This study aims to develop an automated algorithm to characterize airways from segmented lung CT scans and apply this to a retrospective population. This approach reduces the time required to analyze images and obtain disease-staging results. Methods This framework consists of two stages. The first stage extracts and skeletonizes the airway tree from lung CTs, while the second stage measures lung features, including airway volumes, branch counts, generation splits, diameters, and cross-sectional areas. This permits comprehensive characterization for use in clinical assessment. Results The airways analysis was performed on 169 CT volumes ranging in age from 6 to 18 years of age, revealing substantial differences in detected airway branches, generation splits, and normalized airway volume between the control and CF groups. The framework also measures airway diameters and cross-sectional areas, revealing an increase in the number of small airways in cystic fibrosis patients, due to early bronchiectasis. These findings align with previous research and demonstrate the framework's ability to accurately quantify airway changes in patients with CF. Discussion The framework extracts entire airway trees, facilitating measurements of volume, branch count, diameters, and cross-sectional areas, which change with CF severity and/or treatment. However, partial lung atelectasis can limit the accuracy of airway detection in moderate-to-severe cases. Funding NIA U54 AG054345 and NIA R21 AG07857501

02.
arXiv (CS.CV) 2026-06-24

TSegAgent: Zero-Shot Tooth Segmentation via Geometry-Aware Vision-Language Agents

Automatic tooth segmentation and identification from intra-oral scanned 3D models are fundamental problems in digital dentistry, yet most existing approaches rely on task-specific 3D neural networks trained with densely annotated datasets, resulting in high annotation cost and limited generalization to scans from unseen sources. Thus, we propose TSegAgent, which addresses these challenges by reformulating dental analysis as a zero-shot geometric reasoning problem rather than a purely data-driven recognition task. The key idea is to combine the representational capacity of general-purpose foundation models with explicit geometric inductive biases derived from dental anatomy. Instead of learning dental-specific features, the proposed framework leverages multi-view visual abstraction and geometry-grounded reasoning to infer tooth instances and identities without task-specific training. By explicitly encoding structural constraints such as dental arch organization and volumetric relationships, the method reduces uncertainty in ambiguous cases and mitigates overfitting to particular shape distributions. Experimental results demonstrate that this reasoning-oriented formulation enables accurate and reliable tooth segmentation and identification with low computational and annotation cost, while exhibiting strong generalization across diverse and previously unseen dental scans.

03.
PLOS Medicine 2026-05-14

Antibody fine specificity correlates with protection from malaria for the RTS,S vaccine in young African children: A post hoc analysis of a phase IIb randomised controlled trial

Authors:

by Alessia Hysa, D. Herbert Opi, Joshua Waterhouse, Sandra Chishimba, Jessica L. Horton, Natalie Kingston, Hans J. Netter, David Wetzel, Michael Piontek, Gaoqian Feng, Jahit Sacarlal, Carlota Dobaño, Liriye Kurtovic, James G. Beeson Background The RTS,S/AS01 malaria vaccine was recently approved for implementation in children, but only provides modest and short-lived efficacy against malaria. RTS,S targets a portion of the Plasmodium falciparum (Pf) circumsporozoite protein (CSP), comprising the central NANP-repeat region and C-terminal domain. Mechanisms of immunity and correlates of protection for the RTS,S vaccine are not well defined, hindering progress towards generating highly effective CSP-based vaccines. Methods and findings We investigated epitope specificity and cross-reactivity of vaccine-induced antibodies to six peptides representing CSP epitopes in the N-terminal and central NANP-repeat region. We evaluated antibody reactivity in preclinical mouse vaccine studies, among CSP-specific monoclonal antibodies (mAbs), and in a large RTS,S phase IIb clinical trial in young children 1–4 years old (n = 735).The preclinical mouse vaccine studies and CSP-specific mAbs were used to initially evaluate IgG responses to the six peptides. Mice immunised with the central NANP-repeat region had IgG with cross-reactivity to an epitope in the N-terminal region. Additionally, we demonstrated that a single CSP-specific mAb could display cross-reactivity to several CSP epitopes. Through post hoc quantification and analysis of antibody responses in the RTS,S phase IIb clinical trial, we found that a subset of children generated IgG with specificity for a short NANP-repeat epitope (NANP2; amino acid sequence: NANPNANP) and cross-reactivity to an N-terminal epitope (J1; amino acid sequence: KQPADGNPDPNANPN). Notably, children with high IgG responses to NANP2 and J1 had a significantly reduced risk of clinical malaria, compared to children with low responses (IgG to NANP2 (aHR: 0.838 (95% CI [0.716, 0.981]; p = 0.028)) and J1 (aHR: 0.718 (95% CI [0.611, 0.844]; p 

04.
arXiv (CS.CV) 2026-06-12

Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata

AI governance for medical imaging is formalizing: the 2026 ACR-SIIM Practice Parameter recommends local acceptance testing and ongoing drift monitoring, and the ACR Assess-AI registry monitors AI outputs using DICOM metadata for context. We argue that a necessary, currently unmonitored layer sits beneath output metrics: whether incoming studies remain within the acquisition envelope a model was validated on. Using a LUNA16-trained MONAI RetinaNet lung-nodule detector, we test whether acquisition state behaves as a structured, measurable variable. On real paired CT differing only in reconstruction kernel (NLST B30f vs B80f), kernel alone shifted AI-measured diameter and flipped a Fleischner size category in 5.2% (8 of 155) of nodules at fixed patient and acquisition, while detection confidence was unchanged (Wilcoxon p=0.22). Under controlled LIDC-IDRI perturbations the effects dissociated by axis: the noise axis degraded detection confidence (p=5.9e-32, concentrated in nodules under 6 mm) but not measurement, while the frequency/kernel axis corrupted measurement (p=8.6e-13) but not detection. A 4-feature pixel fingerprint recovered reconstruction identity (patient-level AUC about 0.95 on real CT, 0.995 on a QIBA phantom) where the ConvolutionKernel DICOM tag was uninformative (identical labels across reconstructions). The kernel axis transported across four manufacturers (leave-one-vendor-out AUC 0.94-0.98, matching the within-vendor ceiling). Acquisition state thus maps to distinct AI failure modes, frequency content to measurement reliability and noise to detection sensitivity, and is not recoverable from metadata. Acquisition-aware, input-side validation is the missing layer for the acceptance-testing and drift-monitoring requirements now entering imaging-AI accreditation.

05.
arXiv (CS.CV) 2026-06-12

Surflo: Consistent 3D Surface Flow Model with Global State

Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps that grow linearly with input count, while global-latent methods commit to a fixed, low-resolution output. We introduce Surflo, which compresses a variable number of unposed RGB views into K latent tokens-one global state-and decodes oriented 3D surface points by independently transporting them from noise onto the surface via flow matching. This frees the output from any fixed grid or token budget: the same latent yields from a few thousand to a million points in a single forward pass. To suppress the local inconsistencies inherent to independent per-point decoding, an inference-time guidance term correlates nearby points by injecting a photometric gradient during ODE integration. Surflo matches or surpasses feed-forward baselines on surface metrics, runs an order of magnitude faster than optimization-based methods that require hundreds of views, and is the only feed-forward approach to combine a global latent with arbitrary-resolution decoding.

06.
arXiv (CS.AI) 2026-06-12

Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation

arXiv:2606.12594v1 Announce Type: new Abstract: Modern Lean theorem provers achieve strong performance only with substantial training and inference compute, driven in part by scarce verified proof data and the long reasoning traces of formal proof search, making both supervised fine-tuning (SFT) and sampling expensive. We introduce Pythagoras-Prover, a compute-efficient open-source family of Lean theorem provers built for practical compute budgets. The family spans two generation paradigms: autoregressive models at 4B and 32B parameters, and a first proof-of-concept diffusion-based prover (4B) that iteratively refines Lean proofs at inference time. For training efficiency, we build a Lean-verified corpus stratified into easy, medium, and hard problems for curriculum SFT, so models acquire proof skills progressively from shorter, simpler proofs to longer, harder ones. During SFT, a dynamic proof-reasoning filtering scheme preserves informative proof traces while keeping each instance within an 8k-token context budget. We also introduce Augmented Lean Formalisation (ALF), which expands scarce verified corpora into variants of formal statements, populated via self-distillation for extra training signal without formally verifying every mutated instance. By perturbing known problems while preserving their formal character, ALF reduces reliance on any statement's surface form. Empirically, Pythagoras-Prover-4B surpasses DeepSeek-Prover-V2-671B at pass@32 on MiniF2F-Test (86.1% vs 82.4%) with ~167x fewer parameters, while Pythagoras-Prover-32B sets the open-source state of the art at 93.0% on MiniF2F-Test and solves 93 of 672 PutnamBench problems. We release MiniF2F-ALF, an ALF-mutated contamination-sensitive benchmark on which every evaluated model loses accuracy; here our 32B remains strongest and our 4B matches the prior state of the art, Goedel-Prover-V2-32B.

07.
arXiv (CS.CV) 2026-06-16

Closed-Loop Triplet Synergistic Generation for Long-Form Video

Multi-shot long-form video generation remains challenging due to identity drift and compounding inconsistencies across shots. While storyboard-driven pipelines improve controllability, they are often executed in a feed-forward manner, with limited mechanisms to incorporate generated visual evidence back into subsequent conditioning. We propose CoTriSyGen, an agentic framework that formulates multi-shot long video generation as a closed-loop visual-text-memory synergy process, where planned intent, persistent memory, and generated visuals are jointly leveraged for iterative correction and long-range coherence. A vision-language-model-based analyzer reasons over this triplet and produces updates to both prompts and memory along two pathways: (i) intra-shot refinement, which triggers targeted regeneration when semantic or compositional violations are detected and refines image-to-video prompt for coherent motions; and (ii) inter-shot refinement, which rewrites subsequent-shot prompts to propagate newly manifested entities or attributes and improve prompt quality (e.g., compositional grounding and cinematic fluency) based on generated evidence. The loop is grounded in an entity-centric memory modeled as a mutable visual state that evolves as the story progresses, which is continuously updated by both the generator and the analyzer by adding new and evolved entities to reflect appearance changes, accumulated multi-view evidence, and multi-entity compositions. Experiments on our curated StoryBench benchmark demonstrate substantial improvements in cross-shot consistency, prompt adherence, and cinematic continuity over representative methods.

08.
PLOS Medicine 2026-06-23

Multi-omics biomarkers of endothelial dysregulation preceding chronic lung allograft dysfunction: A prospective cohort study

Authors:

by Giulia Iacono, Christina Begka, Bailey Cardwell, Carmel Daunt, Roxanne Chatzis, Celine Pattaroni, Alana Butler, Matthew Macowan, Bronwyn Levvey, Gregory I. Snell, Glen P. Westall, Benjamin J. Marsland Background Long-term survival of lung transplant recipients remains limited by chronic lung allograft dysfunction (CLAD). CLAD is only diagnosed following a persistent and substantial decline in lung function, after which irreversible damage to the lungs has occurred, limiting opportunities to effectively intervene at an early stage. There is a critical need for earlier detection prior to its clinical manifestation. The immunological drivers of CLAD remain unclear, limiting the development of predictive biomarkers and new therapies. Methods and findings In this hypothesis-generating, prospective cohort study, we profiled the microbial, metabolic, lipidomic, and gene expression dynamics of longitudinally collected broncho-alveolar lavages (BALs) from 56 CLAD-free lung transplant recipients up to 30 months post-transplant, and compared BALs from 13 CLAD-free patients to BALs from 13 patients who developed CLAD. In CLAD-free patients, the first 6 months post-transplant were hallmarked by diminished microbial diversity and increased abundance of Staphylococcus and Candida, coupled with upregulated innate and adaptive immune responses, and elevated nitric oxide metabolism (FDR 

09.
arXiv (CS.AI) 2026-06-11

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

arXiv:2606.07537v1 Announce Type: cross Abstract: Large language models hallucinate–producing fluent, confident, factually wrong outputs–with a consistency that persists across generations and scales. Existing taxonomies classify hallucination by output type, distinguishing intrinsic from extrinsic failures and faithfulness from factuality divergence. These frameworks are descriptively rigorous but do not identify which internal mechanism produced a given instance. This paper analyses hallucination as a structural consequence of three architectural decisions that together form a compound failure system. Self-attention's co-occurrence learning substitutes statistical proximity for semantic meaning and produces entity confusion, fact misattribution, and semantic drift. The maximum likelihood estimation training objective optimises next-token probability without factual constraint, rewarding statistically plausible outputs regardless of their truth value. Autoregressive decoding's permanent left-to-right commitment under exposure bias ensures that a single wrong token cascades forward through the entire output sequence without revision. Dataset pathologies–long-tail deficiencies, training bias, and synthetic pollution–amplify these vulnerabilities but do not independently cause them. We make three contributions. First, we map each mechanism to a specific output category in the Alansari and Luqman taxonomy, locating intrinsic hallucination in self-attention, extrinsic hallucination in MLE, and logical inconsistency in autoregressive decoding. Second, we show that each commonly cited dataset pathology exploits one of these mechanisms rather than originating hallucination independently. Third, we identify the diagnostic limitation of output-type-only classification and contrast it with inference-layer mitigation approaches.

10.
arXiv (quant-ph) 2026-06-16

Diagonal-Budgeted Trotterization for Efficient Quantum Hamiltonian Simulation

arXiv:2606.16959v1 Announce Type: new Abstract: Efficient classical simulation of quantum Hamiltonian dynamics is often bottlenecked by exponential state growth and the overhead of generic sparse linear algebra. We introduce diagonal-budgeted Trotterization, a structure-aware strategy that decomposes Hamiltonians into factors preserving diagonal sparsity while tightly controlling fidelity loss. Our implementation, HamSim, utilizes a compact diagonal-sparse data layout and specialized C++/CUDA kernels to bypass the overheads of generic formats like CSR. By leveraging SIMD vectorization, multithreading, and GPU acceleration, HamSim achieves high performance across heterogeneous architectures. Benchmarks on the HamLib suite show that HamSim significantly outperforms Qiskit-Aer. On CPUs, HamSim attains speedups of $182$–$1,269\times$ on optimization instances (TSP, MaxCut) and $4.8$–$841\times$ on physical models (TFIM, Heisenberg). On GPUs, it achieves up to $178\times$ speedup for $12$–$16$ qubit problems. Unlike traditional Trotterization, HamSim maintains near-perfect fidelity without requiring exponential steps. This demonstrates that diagonal-aware numerical kernels provide a scalable foundation for high-fidelity classical Hamiltonian simulation.

11.
medRxiv (Medicine) 2026-06-23

Isolation And Characterization Of Bacteria Associated With Urethritis In Women Within Child Bearing Age Attending Local African Health Clinics

Background: Urethritis in women of childbearing age constitutes a significant but underreported burden of reproductive morbidity in Sub-Saharan Africa, where diagnostic constraints often necessitate suboptimal syndromic management. Methods: To identify the localized etiological profile, mid-stream urine and urethral swab specimens were prospectively collected from symptomatic women attending local clinics, subjected to standard microbiological culture, and characterized using rigorous phenotypic and biochemical diagnostic protocols. Results: Microbiological analysis successfully isolated a high prevalence of both Gram-negative and Gram-positive uropathogens, predominantly Escherichia coli, Staphylococcus aureus, and Klebsiella pneumoniae, demonstrating distinct phenotypic traits characteristic of the regional microbial ecology. Conclusion: The pronounced isolation of these specific bacterial agents highlights the critical inadequacy of generalized empirical treatments and underscores the urgent need for tailored diagnostic criteria in resource-limited African healthcare settings.

12.
arXiv (CS.LG) 2026-06-15

Conformal calibration and look-elsewhere effect in anomaly detection for new-physics searches

arXiv:2606.13780v1 Announce Type: cross Abstract: Machine-learned anomaly detection is reshaping searches for new physics, but it has outrun the statistics used to interpret it. A raw anomaly score has no calibrated meaning, a model that scans many regions inflates the look-elsewhere effect, and the asymptotic significances the field relies on are blind to the background mismodelling that anomaly detectors are especially prone to. We propose a calibration layer, built on conformal prediction, that turns any anomaly score into a defensible significance with distribution-free, finite-sample guarantees. Conformal prediction converts scores into valid local p-values, weighted and Mondrian variants repair the sideband-to-signal-region exchangeability failures that resonant searches suffer, and a Gross-Vitells step carries the result through to a look-elsewhere-aware global significance. The layer does two things at once. It exposes miscalibration that the standard pipeline cannot see, and it corrects it without retraining the detector. On public LHC Olympics data, a classifier develops a substructure-mass correlation that makes sideband-calibrated background p-values anti-conservative. Taken at face value, this manufactures a $\sim 46\sigma$ excess from background sculpting alone, which the label-free weighted correction removes, restoring an honest null. When run as a blind wide-mass bump hunt, the standard asymptotic and unweighted procedures fabricate $\gtrsim10\sigma$ excesses and $\approx5\sigma$ excesses even in signal-free windows, while the conformal layer raises no false alarms and its global false-positive rate is verified on background-only pseudoexperiments. The result is an auditable, detector-agnostic path from an uncalibrated score to a trials-factor-aware significance, ready to be folded into experimental anomaly searches.

13.
arXiv (CS.CV) 2026-06-11

ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction

In this report, we present our third-place solution for the DataMFM Challenge Track 1: Document Parsing. This track requires models to recover structured Markdown documents from document page images while preserving textual content and document structure. To address the complementary requirements of accurate content recovery and faithful structure reconstruction, we propose ParseFixer, an agentic framework for backbone parsing and selective correction. ParseFixer consists of two key modules: Full-Page Backbone Parsing (FBP) and Agentic Selective Correction (ASC). FBP produces stable initial Markdown outputs with MinerU2.5 Pro, while ASC detects high-value parsing failures and repairs them through a verify-and-rollback correction process. By placing selective multimodal correction after open-source backbone parsing, ParseFixer improves the recovery of key document elements without rewriting reliable backbone predictions. On the test set, our final system achieves an overall score of 61.78 and ranks third in Track 1, demonstrating its effectiveness for accurate document parsing. Our code will be released at: https://github.com/iLearn-Lab/CVPRW26-ParseFixer.

14.
arXiv (CS.AI) 2026-06-16

Open-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents

arXiv:2606.16038v1 Announce Type: cross Abstract: The path toward autonomous software engineering is currently bottlenecked by a severe deficit of diverse, large-scale trajectory data. We address this by introducing \ourdataset, an expansive dataset of 207,489 agentic trajectories spanning nine programming languages (Python, Go, TS, JS, Rust, Java, PHP, C, C++). Sourced from 20,000 real-world PRs via OpenHands and SWE-agent harnesses, the dataset utilizes a hybrid-reasoning synthesis: Minimax-M2.5 generates trajectories with explicit "thinking" processes, while Qwen3.5-122B provides high-quality "non-thinking" traces. Filtered for permissive licenses (MIT, Apache, BSD) from SWE-rebench-V2, this data facilitates the training of models capable of long-horizon reasoning. We validate the dataset by fine-tuning the Qwen3-30B-A3B series (Thinking, Instruct, and Coder). The best performing model achieves resolve rates of 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro. These results establish Open-SWE-Traces as a premier resource for distilling human-level software engineering capabilities into efficient, open-source agentic LLMs.

15.
arXiv (CS.CL) 2026-06-16

LLM-Assisted Stance Detection in Scientific Discourse: A Test Case in Bayesian Cognitive Science

Qualitative coding is central to social science, but expert annotation is difficult to scale. LLMs offer a possible extension, yet require careful validation when the target construct is interpretive, theoretically loaded, and only indirectly expressed. We study this problem in a difficult case: detecting whether authors treat Bayesian models as descriptions of mental and neural mechanisms (realism) or as useful mathematical tools (instrumentalism). Our method combines a theory-driven codebook, expert-coded reference annotations, a diagnostic-gated prompt-optimization search yielding a shared zero-shot prompt for three frontier LLMs (GPT-5.1, Claude Sonnet 4.6, Gemini 3 Pro Preview), and multi-rater reliability analysis. The final prompt achieved a held-out combined reliability score of 0.76 (harmonic mean of ICC = 0.79 and $\alpha$ = 0.74), with all diagnostics satisfied. Deployed on 6,858 quotes from 210 articles, the three LLMs reached substantial quote-level agreement (ICC = 0.80; $\alpha$ = 0.76; combined = 0.78) and near-perfect article-level rank stability ($r$ = 0.96-0.97 across rater pairs). The corpus was predominantly weakly realist, but article-level stances were rarely uniform: only 1.4% of articles used a single band, while 59.5% spanned four or more. Low-level perception/motor articles scored 8.8 Realism points higher than high-level cognition articles ($p < .001$, $d = 0.60$), quantifying a long-held qualitative intuition. We present this as an expert-led case study; the framework is intended to generalize to similar theoretically demanding tasks, not to all qualitative analysis.

16.
Nature (Science) 2026-06-19

Daily briefing: Human detritus remakes geology

Authors:

What, exactly, is a rock? Plus, a stem-cell success for a severe autoimmune disease and evidence that ‘AI deskilling’ is real. Researchers have tracked the electrical activity of individual brain cells during conversation in real time. Plus, the history of GPS and a cross-species transplant that could reveal clues about the origin of animals.

17.
arXiv (CS.CV) 2026-06-11

SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection

Video understanding is a crucial part of computer vision, with numerous application scenarios. With the increasing popularity of mobile devices, an increasing number of efforts are trying to deploy video understanding models on them. However, existing video understanding models are difficult to deploy due to their large size and prohibitive power consumption. Spiking Neural Networks (SNNs) have shown bioplausibility and low power advantages over Artificial Neural Networks (ANNs), especially on neuromorphic chips which are regarded as essential components of future mobile devices. However, excessively long conversion time-steps and severe performance degradation problems limit their application. To solve the problems above, we explore the application of SNNs on temporal action detection (TAD), which is an important task in video understanding, and propose the first SNN-based end-to-end TAD architecture coined as SpikeTAD. While maintaining extremely low power consumption, SpikeTAD achieves an average mAP of 67.2% in THUMOS14 and 37.42% in ActivityNet-1.3, demonstrating the feasibility of a low-power TAD model. Our code is available at https://github.com/MCG-NJU/SpikeTAD.

18.
arXiv (CS.CV) 2026-06-11

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

High-stakes clinical use of large vision-language models (LVLMs) requires reasoning that is grounded in visual evidence and clinical knowledge, not just correct final answers. We introduce OpenMedReason, a large-scale, open multimodal medical reasoning corpus comprising approximately 450K image-question-answer instances whose reasoning traces are primarily derived from curated biomedical, human-authored scientific articles. OpenMedReason provides high-fidelity supervision beyond synthetic chains of thought, covering diverse medical domain vision modalities such as radiological scans, microscopic images, visible light photographs, charts, and others. We complement it with OpenMedReason-Bench, a held-out benchmark that allows fine-grained evaluation of LVLMs along three complementary axes of capability, including perception, medical knowledge, and rationale, enabling diagnostic evaluation beyond final-answer accuracy. OpenMedReason is a rich training resource that exhibits its effectiveness in both supervised fine-tuning (SFT) and reinforcement-based alignment. Training with OpenMedReason yields a 20% average improvement in VQA accuracy over the base model and achieves performance within 4.2% of the strongest comparable-scale medical LVLMs. Fine-grained performance analysis confirms that the gains are not concentrated in any single axis: OpenMedReason improves perception, medical knowledge, and rationale jointly, and its reasoning traces are preferred over those of the base model in 86.1% of pairwise comparisons. We release the code and dataset at huggingface.co/datasets/neginb/OpenMedReason.

19.
arXiv (CS.AI) 2026-06-11

DuoBench: A Reproducible Benchmark for Bimanual Manipulation in Simulation and the Real World

arXiv:2606.11901v1 Announce Type: cross Abstract: Bimanual robot systems substantially expand manipulation capabilities, but coordinating two arms introduces additional control complexity and failure modes that are not well captured by existing benchmarks. We introduce DuoBench, an extensible benchmarking framework for bimanual manipulation policies on the FR3 Duo platform. DuoBench comprises eleven tasks spanning four coordination categories, implemented in simulation and partially reproduced in the real world through reproducible task recipes with 3D-printable assets. In addition, we propose a stage-based evaluation scheme that supports fine-grained semantic failure analysis beyond binary success and provide human-teleoperated datasets for all benchmark tasks. We benchmark several dual-arm imitation-learning and vision-language-action policies in simulation and on real hardware. Our results show that current policies remain challenged by bimanual manipulation, particularly in early interaction stages, parallel arm execution, and transfer between simulation and real-world settings. DuoBench provides a reproducible testbed for diagnosing these failure modes and studying future methods for dual-arm policy learning. Code, datasets, and videos are available at https://duobench.github.io/

20.
medRxiv (Medicine) 2026-06-15

Nocturnal Respiratory Rate and Variability Predict Long-term Mortality in Stable Outpatients with Cardiovascular Disease

Background: Respiratory rate (RR) predicts short-term mortality in acute care settings, yet its prognostic significance in clinically stable outpatients remains poorly defined. Objectives: To determine whether the median and variability of nocturnal respiratory rate (NRR) are independently associated with long-term cardiovascular and all-cause mortality in outpatients with cardiovascular disease. Methods: We analyzed overnight chest belt waveforms from elective polysomnography in 5,679 older adults with cardiovascular disease enrolled in the Sleep Heart Health Study (SHHS). NRR was quantified at 30-second resolution, and per-subject median NRR and within-night variability (standard deviation) were derived. Kaplan-Meier survival analysis and Cox proportional hazards models were used to evaluate associations with cardiovascular and all-cause mortality over 3-year and 15-year follow-up periods, adjusting for demographic characteristics, cardiopulmonary comorbidities, and sleep apnea severity. Results: Higher median NRR and greater NRR variability were each associated with increased cardiovascular and all-cause mortality. Combining these metrics identified a high-risk group characterized by elevated median and high variability of NRR, with approximately five-fold higher 3-year all-cause mortality compared with a low-risk group; this association remained significant in Cox models (unadjusted HR: 2.61; 95% CI: 1.65, 4.14; p

21.
arXiv (CS.CL) 2026-06-18

PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes

Recent advancements in Large Language Models (LLMs) have empowered home assistants with natural language interaction capabilities. However, current assistants overlook the progressive omission that occurs in human dialogue as shared context accumulates, leading to more elliptical expressions for efficient communication. Thus, current assistants still struggle to interpret such elliptical expressions accurately, which limits their effectiveness in real-world applications. In practical smart home scenarios, assistants face two major challenges caused by elliptical commands: (1) referential ambiguity caused by different environmental expectations among multiple users; and (2) intention ambiguity resulting from user preferences that evolve over time or change with the environment. To address these challenges, we introduce PEC-Home, the first simulated home dataset specifically designed for interpreting progressively elliptical commands in smart homes. Extensive experiments on various LLMs, including GPT-4o, show that existing home assistants struggle to execute user-intended operations based solely on elliptical commands. Even when equipped with tools for storing and retrieving user dialogue history, execution accuracy remains below that achieved with complete commands.}.

22.
arXiv (quant-ph) 2026-06-16

Benchmarking Quantum Extreme Learning based on Gaussian Boson Sampling

arXiv:2606.15230v1 Announce Type: new Abstract: Reservoir models offer a hardware-efficient learning paradigm for noisy intermediate-scale quantum devices by exploiting untrained quantum dynamics as a fixed feature map and restricting optimization to a simple classical readout layer. We propose a quantum extreme learning machine implemented using gaussian boson sampling and an encoding strategy that achieves high classification accuracy while reducing optical resource requirements. Classical inputs are jointly encoded in the squeezing parameters and in the interferometer unitary, enabling sampling-based, highly nonlinear feature maps while leveraging large-scale GBS output statistics, which are conjectured to be classically intractable. We systematically compare multiple families of quantum features accessible in the same setup and find that photon-number sampling probabilities provide the best performance, consistent with their higher effective feature dimensionality. Finally, we benchmark against classical nonlinear baselines and analyse robustness under noisy scenarios, showing competitive performance with fewer trainable parameters and indicating practical promise for near-term photonic implementations.

23.
arXiv (CS.CL) 2026-06-16

From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal Text

Modeling dimensional affect in longitudinal text requires distinguishing current affect estimation from future affective change forecasting. Existing approaches often treat each text as an independent observation and apply similar assumptions to both tasks, without testing whether they rely on different information sources. This paper investigates that distinction using longitudinal self-reported ecological essays and feeling-word entries. We propose the Trait–State Affective Prediction (TSAP) framework and its temporal extension E-TSAP for per-text valence and arousal prediction, evaluated on a held-out prediction test set of 1,737 entries from 91 users. We further propose the Affective Change Forecaster Hybrid (ACF-Hybrid) for next-step affective change forecasting, evaluated on a held-out forecasting test set of 46 users. For prediction, E-TSAP achieves composite Pearson correlations of 0.670 for valence and 0.449 for arousal. For forecasting, textual representations perform worse than compact numeric trajectory baselines: the text-inclusive model achieves only r=0.316 for valence and r=0.284 for arousal, whereas a simple prior-state baseline reaches r=0.615 and r=0.670, respectively. ACF-Hybrid, using dimension-specific numeric trajectory features, achieves r=0.659 for valence and $r=0.658$ for arousal. These results show that textual semantics support current affect prediction, whereas future affective change is better captured through prior numeric trajectory dynamics.

24.
medRxiv (Medicine) 2026-06-16

Care Delivery Gap framework: a proof-of-concept patient-reported measure of guideline-referenced care-process omissions in sickle cell disease

Abstract Background:Sickle cell disease (SCD) is concentrated in sub-Saharan Africa, where delivery of guideline-referenced care remains challenging. Current evaluation approaches rely largely on access indicators and clinical outcomes, which do not directly measure care delivery. We developed the Care Delivery Gap (CDG) framework, a patient-reported approach for identifying care-process omissions, and conducted a proof-of-concept study to assess feasibility and explore variation across income strata. Methods: We conducted a cross-sectional framework-development study involving a proof-of-concept sample of 52 individuals with SCD or caregivers recruited through clinics and moderated SCD communities across Africa, North America, and Europe between June 2025 and March 2026. The CDG framework assessed patient-reported omissions in specialist involvement, follow-up continuity, cardiovascular screening, and biochemical surveillance. Analyses were descriptive. Results: Substantial multi-domain care-process omissions were identified despite high reported healthcare engagement. Across geographic income strata, cardiovascular screening was reported by 4/35 (11%) LMIC versus 16/17 (94%) HIC participants, and regular follow-up within the preceding 12 months by 14/35 (40%) versus 16/17 (94%), respectively. High CDG scores, representing 1 omissions across three or four domains, occurred in 20/35 (57%) LMIC compared with 1/17 (6%) HIC participants. Similar disparities were observed across specialist review and vitamin B12 surveillance domains. Conclusion: A structured patient-reported framework identified multi-domain omissions in guideline-referenced SCD care, including among individuals reporting healthcare access. The divergence between access indicators and reported care delivery suggests that service contact alone may not reflect care quality. The framework provides a feasible foundation for future process-level quality measurement in high-burden settings.

25.
arXiv (CS.LG) 2026-06-24

Project Ariadne: Prompt-Conditioned Route Generation for Synthesis Planning

arXiv:2606.24184v1 Announce Type: new Abstract: Retrosynthetic planning seeks to connect a target molecule to commercially available starting materials through a multistep route. Classical planners construct such routes by iteratively applying single-step reaction models within a search procedure; constrained variants often require specialized algorithms or architectural changes. Direct route generation reframes retrosynthesis as sequence generation, but existing direct-generation methods still train separate models for different planning specifications. We introduce Ariadne, a decoder-only route generator that represents the target, optional constraints, and route in one prompt-completion sequence. On the RetroCast/PaRoutes mkt-cnv-160 benchmark family, one 24-layer checkpoint follows route-depth and required-starting-material prompts: adding the corresponding prompt fields raises Solv-0 by 13.7 points for depth constraints and 31.2 points for required-leaf constraints. Ariadne also improves over DESP, a bidirectional search planner, on required-leaf Top-10 and Solv-0 in 24 GPU-minutes versus 6.8 GPU-hours. On standard reconstruction, Ariadne is comparable to DMS Explorer XL at about half the reported inference time. Across additional target-only benchmarks, Ariadne's clearest gains are on route-holdout reconstruction, whereas AiZynthFinder MCTS remains stronger on several Solv-0 comparisons. These results extend sequence generation from specialist retrosynthesis models to prompt-conditioned structural route generation. We release the codebase and training scripts to support further work, but do not introduce Tier-1–3 route checkers; those remain the main bottleneck before models of this kind can become useful to experimental chemists.