Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-15

STREAM: Multi-Tier LLM Inference Middleware with Dual-Channel HPC Token Streaming

arXiv:2606.13968v1 Announce Type: cross Abstract: Researchers and practitioners working with large language models face a fragmented landscape: local models are free and private but hardware limits the model size and context windows a researcher can use; institutional HPC centers offer powerful GPU resources at no marginal cost and keep data within institutional boundaries, but operate behind firewalls and are designed for batch jobs rather than interactive use; commercial cloud APIs provide frontier-model quality on demand but impose significant cost and data retention policies unsuitable for sensitive research data. No existing system unifies all three. STREAM (Smart Tiered Routing Engine for AI Models) addresses this gap with four contributions: (1) a three-tier routing architecture combining local, HPC, and cloud inference with a local LLM-based complexity judge; (2) a dual-channel HPC streaming architecture that separates the Globus Compute control plane (authentication and job dispatch) from a WebSocket relay data plane (token delivery), enabling sub-second TTFT (0.54 s median, 21.1x over batch mode's 11.40 s) through institutional firewalls without VPN or firewall rule changes, with end-to-end AES-256-GCM encryption ensuring the relay operator cannot read token payloads; (3) tier-aware context summarization that prevents long conversations from forcing simple queries onto expensive tiers; and (4) an HPC-as-API proxy mode that exposes HPC inference as an OpenAI-compatible endpoint callable from any standard client with no HPC expertise, a deployment pattern made practical only by the sub-second TTFT of contribution (2). Llama 3.2 3B achieves 85.1% free-tier retention on a 1,200-query benchmark spanning ten domains. Measured TTFT: 0.26 s local, 0.54 s HPC (relay), 1.68 s cloud.

02.
arXiv (CS.CV) 2026-06-12

Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X

We describe a Camera and LiDAR fusion detector developed for the TUMTraf V2X cooperative 3D object detection track of the DriveX 2026 challenge. The detector fuses three roadside cameras with a fused infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space and predicts boxes through a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head. Trained on the provided train and validation splits, the model reaches a 3D mAP of 0.85 on the public Codabench test split. While iterating on the system, we observed that 44 of the 50 test frames are also present in the released train (40) and validation (4) splits with their labels. We therefore conducted two additional studies to quantify how this overlap affects the final score: (1) a finetuning run that oversamples the 44 overlapping frames, reaching 0.89 mAP, and (2) a post-processing run that replaces predictions on those frames with the released ground truth, reaching 0.99 mAP (uploaded to our Codabench account for testing but not published on the leaderboard). All three configurations and their per-class results are reported.

03.
arXiv (quant-ph) 2026-06-17

Approximately Decoding the Colour Code

作者:

arXiv:2606.18035v1 Announce Type: new Abstract: Recently we showed that minimum weight decoding in the (6.6.6 planar) colour code is NP-hard. However, it remained an open question as to whether it was possible to approximate the minimum weight decoding arbitrarily closely in polynomial time. In this paper we prove that it is possible: for any $\varepsilon>0$ there is an polynomial time algorithm that, given a syndrome, can find an error-set generating that syndrome whose weight is at most $1+\varepsilon$ times the weight of the minimum weight decoding. As a consequence we see that, for any $\varepsilon>0$, there is a polynomial time algorithm that can correct all errors of weight up to $(1-\varepsilon)d/2$ in the distance $d$ colour code (so almost up to the theoretical $d/2$ limit). The polynomial we give is impractically large, but it does open the door for sensible polynomial time algorithms that approximate minimum weight decoding and, in particular, shows that approximate decoding is not NP-hard.

04.
arXiv (CS.AI) 2026-06-24

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

arXiv:2606.24636v1 Announce Type: new Abstract: Cinematographic captioning aims to describe how a video is filmed using professional film-language concepts such as camera movement, shot size, depth of field, composition, and shooting angle. This capability is important for fine-grained video understanding and controllable movie-quality video generation, yet remains underexplored in existing multimodal large language models. Unlike question-answering-based evaluation of cinematic understanding, cinematographic captioning requires a unified open-form description over multiple cinematographic dimensions. This task is challenging for two main reasons: the model must infer professional cinematographic concepts from subtle visual evidence, and it must generate captions that are both comprehensive and accurate. Accordingly, we propose CineCap, a framework that combines structured reasoning with spatio-temporal anchors and reinforcement learning with comprehensiveness, accuracy, and gated coverage rewards. The former grounds professional cinematographic descriptions in explicit visual evidence and organizes them into compact atomic reasoning for supervised fine-tuning, while the latter improves the balance between descriptive completeness and factual correctness. In addition, we construct CineCap Bench, a benchmark of 472 manually annotated video-caption pairs for systematic evaluation. Extensive experiments show that CineCap consistently outperforms strong proprietary and open-source baselines, establishing a new state of the art for cinematographic captioning. The code, model checkpoint, and benchmark are publicly available in https://github.com/Hectormxy/CineCap.git.

05.
arXiv (CS.LG) 2026-06-25

Neural operator-based digital twins for modeling amyloid-$\beta$ and tau propagation and treatment optimization in Alzheimer's disease

arXiv:2606.25185v1 Announce Type: new Abstract: Accurately predicting the spatiotemporal evolution of amyloid-$\beta$ and tau proteins at the individual level is critical for improving the diagnosis and treatment of Alzheimer's disease. We consider the problem of constructing patient-specific digital twins that model the propagation of these biomarkers on the cortical surface using reaction–diffusion dynamics. A major challenge is that the underlying nonlinear aggregation mechanisms are unknown and must be inferred from sparse, noisy, and heterogeneous longitudinal PET imaging data. To address this, we develop a data-driven framework that learns biomarker dynamics directly from clinical observations. The approach combines operator learning with reduced-order representations to infer governing equations of disease progression from data. Using this framework, we achieve predictive accuracies of 87\% for amyloid-$\beta$ and 81\% for tau. Building on the learned dynamics, we further formulate a PDE-constrained optimal control problem to design personalized therapeutic strategies that regulate pathological protein propagation. By integrating data-driven dynamical modeling with treatment optimization, the proposed digital twin framework provides an interpretable and predictive platform for understanding disease progression and enabling precision interventions in neurodegenerative disorders.

06.
arXiv (quant-ph) 2026-06-25

Closed Quantum Boltzmann Bridges: Coherent Revivals, Hidden Microstates, and the Emergence of Classical Two-Time Entropy Conditioning

arXiv:2606.25260v1 Announce Type: new Abstract: The classical Boltzmann Bridge describes entropy histories conditioned on both an initial low-entropy macrostate and a later macrostate. Unlike the usual past-only formulation of the thermodynamic arrow, this two-time conditioning can produce entropy profiles that rise above the final entropy and then decrease toward the imposed endpoint. In this work, we formulate closed quantum analogues of the Boltzmann Bridge using macro-subspace projectors, unitary time evolution, and Boltzmann entropy defined by the dimension of coarse-grained macroscopic sectors. We first study a minimal coherent chamber-qubit model, in which each particle has only a two-state chamber degree of freedom. Although this model is the most direct quantization of the classical two-box system, its bridge entropy profile is dominated by coherent oscillations and revivals rather than classical relaxation. We then introduce a hidden-microstate bridge, in which each chamber sector contains unresolved internal degrees of freedom while the full dynamics remain unitary. Numerical experiments show that increasing the internal Hilbert-space dimension suppresses sample-dependent revival behavior and produces bridge entropy profiles whose sign structure and coarse-grained shape increasingly agree with the classical Boltzmann Bridge. We further use a Random Forest classifier to explore the parameter regime separating revival-dominated quantum behavior from classical-like coarse-grained bridge behavior. These results suggest that classical two-time-conditioned entropy behavior is not recovered by quantizing the chamber variable alone, but can emerge statistically from closed quantum.

07.
arXiv (CS.AI) 2026-06-12

Multi-Modal Agents for Power Distribution Defect Detection: An Evaluation of Foundation Models

作者:

arXiv:2606.12969v1 Announce Type: new Abstract: The power distribution network is critical to reliable electricity delivery, yet traditional inspection methods face limitations in semantic understanding, generalization, and closed-loop automation. To address these challenges, this paper proposes a Multi-Modal Agent framework specifically for power distribution defect detection. Central to this study is the systematic evaluation of multimodal foundation models as unified cognitive engines. We rigorously assess their integrated performance across three critical capabilities: (1) Perception, where the model must accurately identify equipment and generate expert-level descriptions of defects; (2) Reasoning, where the model interprets visual findings to diagnose causes, assess severity, and plan maintenance strategies based on domain knowledge; and (3) Tool Usage, where the model acts as an autonomous operator to execute actions – such as querying knowledge bases or generating work orders – to achieve closed-loop maintenance. To support this evaluation, a domain-specific evaluation dataset and a comprehensive benchmark are developed. Experimental results demonstrate the strengths and limitations of current foundation models in these three dimensions, providing empirical evidence for deploying autonomous agents in high-stakes industrial environments.

08.
arXiv (quant-ph) 2026-06-12

A Robust Strontium Tweezer Apparatus for Quantum Computing

arXiv:2601.16564v2 Announce Type: replace-cross Abstract: Neutral atoms for quantum computing applications show promise in terms of scalability and connectivity. We demonstrate the realization of a versatile apparatus capable of stochastically loading a 5x5 array of optical tweezers with single $^{88}$Sr atoms featuring flexible magnetic field control and excellent optical access. A custom-designed oven, spin-flip Zeeman slower, and deflection stage produce a controlled flux of Sr directed to the science chamber. In the science chamber, featuring a vacuum pressure of $3 \times 10^{-11}$ mbar, the Sr is cooled using two laser cooling stages, resulting in $\sim 3 \times 10^5$ atoms at a temperature of 5(1) $\mu$K. The optical tweezers feature a $1/e^2$ waist of 0.81(2) $\mu$m, and loaded atoms can be imaged with a fidelity of $\sim 0.997$ and a survival probability of $0.99^{+0.01}_{-0.02}$. The atomic array presented here forms the core of a full-stack quantum computing processor targeted for quantum chemistry computational problems.

09.
arXiv (CS.CL) 2026-06-16

Rapid Poison: Practical Poisoning Attacks Against the Rapid Response Framework

The Rapid Response (RR) framework, deployed in production systems, including Anthropic's ASL-3 safeguards, continuously improves jailbreak-detection classifiers. When new jailbreaks emerge that bypass these classifiers, Rapid Response generates synthetic variants for training, helping the model generalize from the new attacks and quickly adapt. We reveal that prompt injection can infiltrate this pipeline to deliver poisoned samples into the classifier's training set, enabling two attack objectives: (I) targeted poisoning attacks that create false positives on harmless samples by categorizing them as a jailbreak, with a specific desired feature (e.g., certain formatting, subject, or keyword), (II) concept-based backdoor attacks that induce false negatives on jailbreak inputs, generalizing even to jailbreaks from attack strategies the defender explicitly trained against, when the backdoor trigger is present. Importantly, our threat model restricts adversaries to modifying only jailbreak samples (not benign data or labels), a constraint unexplored by prior work that makes the second objective particularly challenging. We address this with Omission Attack, which exploits a new phenomenon: when training on concept-absent unsafe samples, the classifier misassociates that concept's presence with the safe label. Both attacks cause substantial and in some cases near-complete label flipping at only a 1% poisoning rate, achieving up to 100% false positive rates and up to 96% false negative rates.

10.
arXiv (CS.CL) 2026-06-17

Self-Generated Error Training for Token Editing in Diffusion Language Models

作者:

Token-to-token (T2T) editing lets LLaDA2.1 revise committed tokens during block-diffusion decoding. The released recipe trains this editor on random vocabulary corruptions, but at inference the editor sees the model's own fluent, high-confidence draft errors instead. We study this training-inference mismatch and propose self-generated T2T, which performs a no-gradient draft pass, fills masked positions with predicted tokens, and supervises recovery in a second pass under these self-generated corruptions. We implement the update as a short LoRA continued-pretraining pass on LLaDA2.1-mini and evaluate on several benchmarks under the official Q-Mode T2T procedure with unchanged inference parameters. The method generally improves accuracy while reducing T2T edit intensity, mitigating failure modes such as final-digit transcription errors after otherwise correct reasoning and excessive self-correction before short factual answers.

11.
arXiv (quant-ph) 2026-06-24

Quantum Metric Bound State of Light

arXiv:2606.22479v2 Announce Type: replace-cross Abstract: The spatial confinement of defect-induced bound states is conventionally governed by the effective mass in dispersive bands. More recently, Compact Localized States (CLSs) arising from exact destructive interference have been utilized to achieve confinement in flat bands. However, CLSs rely on pristine lattice symmetries and fine-tuned defect profiles. The introduction of a generic local impurity inevitably breaks these strict phase-matching conditions, resulting in extensive bound states whose fundamental length scale has remained an open question. Here, we establish a third regime of confinement: the quantum metric bound state. We provide a rigorous mathematical proof demonstrating that in the absence of kinetic energy and CLS protection, the exponential decay length of these states is lower-bounded by the quantum metric of the unperturbed flat band. We demonstrate the tightness of this geometric limit by constructing a family of highly tunable flat-band generators, and we verify its universality across diverse realistic architectures. Ultimately, this classification establishes the independently measurable quantum metric as a predictive design principle for engineering confined modes in synthetic wave platforms.

12.
arXiv (CS.CV) 2026-06-16

SP$^3$: Spherical Priors for Plug-and-Play Restoration

In this paper, we introduce SP$^3$, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP$^3$ approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projection onto the natural image manifold. Alternating this projection with a closed-form data-consistency step, via Half-Quadratic Splitting, achieves stable convergence without requiring gradient computation during inference. This unique formulation unlocks "anytime" restoration capabilities, producing sharp, plausible images from the first iteration. Evaluations across a variety of image restoration tasks demonstrate that SP$^3$ achieves perceptual quality comparable to state-of-the-art zero-shot diffusion and flow methods while being $3$-$630\times$ faster.

13.
medRxiv (Medicine) 2026-06-23

Associations Between Social Responsiveness and Sleep Disruption are Modulated by Chronotype in Early Adolescence: Cross-Sectional and Prospective Findings from 10,108 Participants of the Adolescent Brain and Cognitive Development (ABCD) Study

Background: Sleep disruption is prevalent in people with neurodevelopmental disorders such as autism but is not clear whether it occurs as an endophenotype or secondary to other behaviours. The ABCD Study is a population-based longitudinal study that monitors the health, demography and lifestyle of over 11,000 children in the US. In this study we leverage these data to investigate whether traits consistent with autism (social responsiveness) are associated with sleep disruption independent of lifestyle and other behavioural measures. Methods: Autistic traits were assessed using the Social Responsiveness Scale at age 11, and sleep disruption and behavioural outcomes were assessed at ages 11 and 13 years using the Sleep Disturbance Scale, and the Child Behaviour Check List, respectively. Demographic, health and lifestyle-related variables were assessed by caregiver questionnaires. Regression models were applied to investigate associations between autistic traits and sleep outcomes. Results: There was a significant cross-sectional association between sleep disturbance and SRS at age 11 years old that was independent of sex, ethnicity, socioeconomic position, physical activity, sedentary behaviour and anxiety/depression ({beta} = 0.12, 95% CI (0.07, 0.17); p < 0.001), that persisted at age 13, and that was modulated by chronotype, with evening types showing a stronger association. Discussion: Social responsiveness assessed in early adolescence (age 11) were associated with sleep disruption independent of multiple confounding factors and were prospectively associated with sleep disruption at age 13 years. These findings contribute to the evidence that disruption of sleep and circadian timing may have a primary role in the neurobiological mechanisms that mediate autistic traits.

14.
arXiv (CS.AI) 2026-06-17

Decidable By Construction: Design-Time Verification for Trustworthy AI

arXiv:2603.25414v4 Announce Type: replace-cross Abstract: A prevailing assumption in machine learning is that model correctness must be enforced after the fact. We observe that the properties determining whether an AI model is numerically stable, computationally correct, or consistent with a physical domain do not necessarily demand post hoc enforcement. They can be verified at design time, before training begins, at marginal computational cost, with particular relevance to models deployed in high-leverage decision support and scientifically constrained settings. These properties share a specific algebraic structure: they are expressible as constraints over finitely generated abelian groups $\mathbb{Z}^n$, where inference is decidable in polynomial time and the principal type is unique. A framework built on this observation composes three prior results (arXiv:2603.16437, arXiv:2603.17627, arXiv:2603.18104): a dimensional type system carrying arbitrary annotations as persistent codata through model elaboration; a program hypergraph that infers Clifford algebra grade and derives geometric product sparsity from type signatures alone; and an adaptive domain model architecture preserving both invariants through training via forward-mode coeffect analysis and exact posit accumulation. We believe this composition yields a novel information-theoretic result: Hindley-Milner unification over abelian groups computes the maximum a posteriori hypothesis under a computable restriction of Solomonoff's universal prior, placing the framework's type inference on the same formal ground as universal induction. We compare four contemporary approaches to AI reliability and show that each imposes overhead that can compound across deployments, layers, and inference requests. This framework eliminates that overhead by construction.

15.
arXiv (CS.CV) 2026-06-17

RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework, to our knowledge, for label-free rare-attribute discovery in diffusion models, requiring no predefined minority categories. RAIGen leverages Matryoshka Sparse Autoencoders and a novel minority metric combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation. The project page is available at https://vssilpa.github.io/RAIGen_webpage/ .

16.
arXiv (CS.AI) 2026-06-25

Power-Flexible AI Data Centers: A New Paradigm for Grid-Responsive Compute

arXiv:2606.25098v1 Announce Type: cross Abstract: The rapid expansion of artificial intelligence (AI) infrastructure is driving unprecedented growth in electricity demand from data centers. Traditional power-system planning treats large computing facilities as inflexible peak loads, leading to costly infrastructure upgrades and long delays in grid interconnection. Recent work has shown that AI clusters can reduce electricity consumption during peak demand through software-based workload orchestration. This article explores how modern GPU-based AI data centers can operate as grid-interactive assets that respond dynamically to power system conditions. We describe an architecture integrating grid signals, workload scheduling, and power telemetry for fine-grained cluster power control. Experimental results from a real-world deployment on a 130 kW GPU cluster demonstrate multiple forms of flexibility, including rapid load reduction, sustained curtailment, and carbon-aware operation while preserving service levels for priority jobs. We further demonstrate performance-aware load shifting across geographically distributed clusters, enabling workloads to migrate toward regions with lower grid stress. Together, these capabilities transform AI infrastructure from static electricity consumers into flexible resources that support grid reliability, accelerate interconnection, and improve computing sustainability.

17.
arXiv (CS.AI) 2026-06-25

Reasonable Motion: A General ASP Foundation for Environment Constrained Movement Trajectory Computation

arXiv:2606.25626v1 Announce Type: new Abstract: We present a general answer set programming based hybrid quantitative-qualitative method for computing constrained branching trajectory modes for moving objects in real-world settings. The method performs constrained traversal of an environment graph, enumerating geometrically admissible motion behaviours as stable models, each constituting a distinct trajectory mode characterised by both domain-dependent and independent factors such as derived event sequence, map topology, and domain norms. The hybrid trajectory computation method is generally applicable across motion characteristics typically encountered in diverse dynamic domains with moving objects, e.g., autonomous driving. We demonstrate applicability and highlight how computed trajectories are traceable to their underlying stable model, thereby affording verifiable interpretability that purely learned approaches cannot provide. We also perform an empirical evaluation with Argoverse 2, a large-scale real-world autonomous driving benchmark representative of the class of dynamic domains within the scope of the proposed method.

18.
medRxiv (Medicine) 2026-06-15

Bidirectional associations between cannabis use, oddball performance, and P3 event-related potential

Importance: Cannabis use remains prevalent in youth despite concerns regarding its potential impact on cognitive function. Unraveling whether the association between cannabis use and cognition is partially due to preexisting differences or primarily related to use is vital to understanding underlying mechanisms. Objective: To estimate the longitudinal association between cannabis initiation and cognitive trajectories, indexed by task performance and P3 event-related potential (ERP), and to estimate whether baseline cognition is associated with cannabis initiation. Design: Data were analyzed from the ongoing longitudinal Collaborative Study on the Genetics of Alcoholism (COGA) cohort, which was followed up approximately every 2-5 years from 2004 to 2025. Setting: 6 sites across the United States. Participants: Adolescent and young adult offspring of past COGA participants and control families who reported on their cannabis use and who had Visual Oddball (VOP) performance and P3 ERP data (N=4814; 52.4% female, 68.4% white) were grouped based on the timing of cognitive data collection relative to cannabis initiation into Pre-onset (n=2,449; [&ge;]1 assessment) and Post-onset (n=998; [&ge;]3 assessments) subsamples. Main Outcomes and Measures: VOP measures include performance accuracy (%), reaction times (ms), and P3 amplitude (V) and latency (ms) during target trials. Cannabis measures included lifetime use of cannabis (i.e., ever used) and age at first use. Results: High P3 amplitude, and prolonged P3 latency and reaction time were associated with a reduced hazard of cannabis initiation (All Hazards Ratio, [H.R.s]< 0.91, p's

19.
arXiv (CS.LG) 2026-06-19

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

arXiv:2510.19893v2 Announce Type: replace Abstract: Medical AI systems demonstrated impressive diagnostic performance, yet they routinely show uneven accuracy across demographic groups, disadvantaging underrepresented populations. Although multimodal reasoning foundation models have pushed clinical diagnosis forward, reinforcement learning-based post-training tends to absorb and magnify the biases present in majority-dominated training corpora. We propose Equitable Group Relative Policy Optimization (EQPO), a hierarchical reinforcement learning method that encourages balanced learning across heterogeneous clinical populations by adaptively reweighting samples according to subgroup representation, task difficulty, and data source. As demographic annotations are frequently missing in real-world clinical data, EQPO additionally applies unsupervised clustering to recover latent subpopulations when they are unavailable. On 7 diagnostic benchmarks covering 5 modalities (X-ray, CT, dermoscopy, mammography, ultrasound), EQPO reduces F1 standard deviation by 43.9% and the maximum cross-group F1 gap by 42.7% on QoQ-Med3-8B over vanilla GRPO, and narrows predictive parity gaps by 27.2% on MedGemma-4B over bias-mitigated RL baselines while raising F1 by 12.5% even without any demographic labels. Examining the training trajectory shows that EQPO steadily improves fairness over the course of optimization, in contrast to baseline methods whose fairness degrades as training proceeds, and the discovered implicit groups remain stable and align with masked demographic attributes. We further release EquiMedGemma-4B and EquiQoQ-Med3-8B, equitability-aware clinical VLLMs that attain state-of-the-art accuracy with markedly smaller demographic gaps.

20.
arXiv (CS.LG) 2026-06-24

Machine Learning Classification and Portfolio Construction: Does the Loss Function Matter?

arXiv:2108.02283v3 Announce Type: replace-cross Abstract: Classification outperforms regression across matched machine learning models in portfolio construction. A stacking ensemble of gradient boosted tree, random forest, and neural network yields a value-weighted annualized Sharpe ratio of 1.83 for classification and 1.11 for regression. This outperformance persists in multiclass settings, across subsamples, and after transaction costs. Spanning tests show that classification retains economically large alphas after we control for regression, whereas regression alphas shrink substantially once we control for classification. These results indicate that classification extracts more return information than matched regression. Our diagnostics trace classification's advantage to sharper and more precise separation of return deciles.

21.
Nature Biotechnology 2026-06-09

Hybrid solid−liquid optics enable scalable, high-resolution light-sheet microscopy across diverse immersion media

作者:

Many data-driven approaches rely on scalable and affordable three-dimensional (3D) imaging across subcellular to organ scales. Although advances in tissue clearing, expansion microscopy and light-sheet microscopy (LSM) have enabled high-resolution imaging of intact specimens, scalability in sample size, throughput and accessibility remains fundamentally limited by detection optics. Here we introduce hybrid solid−liquid optics (HySIL), a flexible refractive design framework in which a solid optical element and a refractive index (RI)-matched liquid function as a continuous optical system for wavefront correction and numerical aperture enhancement. We implement this framework as SCOPE and Super-SCOPE, enabling submicron-resolution, aberration-corrected LSM using long-working-distance air objectives. We demonstrate high-resolution volumetric imaging across diverse biological contexts, including cleared and expanded mouse, salamander and cavefish brains, human induced pluripotent stem cell (iPSC)-derived brain organoids and large intact human tissues for 3D histopathology. By combining enhanced optical performance with low-cost, long-working-distance and multi-immersion compatibility, HySIL provides an accessible and scalable foundation for next-generation volumetric imaging and data-driven biological discovery. Hybrid solid–liquid optics improve light-sheet imaging of intact biological samples.

22.
arXiv (CS.CL) 2026-06-16

Let LLMs Judge Each Other: Multi-Agent Peer-Reviewed Reasoning for Medical Question Answering

Objective: To enhance the accuracy, interpretability, and robustness of large language models (LLMs) in medical question answering (MedQA). Method: We designed a multi-agent peer-reviewed reasoning method in which multiple LLM agents independently generate chain-of-thought reasoning with candidate answers, then act as peer reviewers to evaluate each other's reasoning for factual correctness and logical soundness. The highest-rated reasoning chain is selected to produce the final answer. Experiments were conducted with five state-of-the-art LLMs (Llama-3.1-8B, Qwen2.5-7B, Phi-4, DeepSeek-LLM-7B, GPT-oss-20B) on three benchmark datasets: HeadQA, MedQA-USMLE, and PubMedQA. Performance was compared against single-model chain-of-thought reasoning and chain-of-thought-based majority voting. Results: Peer-reviewed reasoning consistently outperformed both baselines. The best model combination achieved an average accuracy of 0.820 across datasets, exceeding the strongest single model (0.777) and majority voting ensembles (up to 0.789). The method also scaled effectively with more participating models, while peer assessments reliably distinguished high- from low-quality reasoning chains. Conclusion: The proposed multi-agent peer-reviewed reasoning method enables LLMs to act as both solvers and evaluators, yielding superior performance in MedQA. By emphasizing reasoning quality rather than answer agreement alone, this approach improves accuracy, interpretability, and robustness, offering a promising direction for trustworthy biomedical AI systems.

23.
arXiv (CS.AI) 2026-06-25

RARM: Confidence-Gated Progress Reward Modeling for RL in Manipulation

arXiv:2606.22027v2 Announce Type: replace-cross Abstract: Reinforcement learning for robot manipulation is often bottlenecked by reward design, especially in long-horizon tasks: sparse success rewards provide weak supervision, while hand-crafted dense rewards are tedious to design and generalize poorly across tasks. Progress-based reward models offer a promising alternative by estimating how far an observation has advanced toward task completion, but existing approaches often require task-specific demonstrations or progress labels, and can assign high rewards to visually plausible but physically incorrect states. We introduce the Reference-Anchored Reward Model (RARM), a lightweight visual comparator that converts a single successful demonstration into a dense, progress-aware reward. RARM is trained once on general-purpose videos with a contrastive temporal objective, requiring no robot-specific data, task-specific reward labels, or per-task reward engineering. At deployment, RARM matches rollout clips to reference clips and rewards only confident forward progress, suppressing uncertain matches that may otherwise produce false-positive rewards. Across 9 simulated manipulation tasks from LIBERO and MetaWorld and 4 real-world tasks, RARM achieves the best overall success rates in subsequent RL training, with particularly large gains on long-horizon tasks such as cloth folding, where unreliable progress estimates are especially harmful.

24.
arXiv (CS.CV) 2026-06-18

Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation

Glioma segmentation in multiparametric MRI is a critical component of treatment planning. A segmentation model that fails silently on treatment-critical sub-regions represents a patient safety risk that overlap-based metrics such as Dice scores cannot expose. We ask whether voxel-level uncertainty estimation via Monte Carlo (MC) Dropout can reliably identify segmentation errors in clinically critical sub-regions, and whether calibration failure modes are detectable from standard reporting metrics alone. In an empirical two-model case study on 126 BraTS21 patients, we evaluate a high-performance pretrained SegResNet and a locally trained UNet with residual units (UNet-Res). MC dropout preserved segmentation accuracy ($|\Delta Dice|$ $

25.
arXiv (CS.CL) 2026-06-16

A Self Consistency Based Reranking for Narrative Question Answering

Narrative question answering (NQA) is a challenging task in natural language processing that requires models to understand long textual contexts, capture relationships across events, and generate coherent responses. Despite recent advances in pretrained language models, most existing approaches rely on a single decoding output during inference, making them sensitive to generation variability and often resulting in incomplete or inconsistent answers .To address this limitation, we propose a self-ensemble Self-Consistency-Based reranking framework for narrative question answering. The proposed method generates multiple candidate answers for each story-question pair and selects the final answer based on semantic agreement among the generated responses. This allows the model to explore diverse answer formulations while improving robustness through consensus-based selection without requiring modifications to the underlying architecture .The framework combines pretrained and fine-tuned language generation with multi-answer inference and similarity-based reranking. We evaluate the proposed approach on the NarrativeQA dataset using multiple models, including FLAN-T5 (Base and Small) and Pegasus-Large, under both baseline and fine-tuned settings .Experimental results demonstrate that the proposed method consistently improves performance across all models. In particular, FLAN-T5-Base achieves the best overall performance, improving from 82.32% to 86.66% (+4.34%) when combined with self-ensemble inference. Additionally, the largest improvement is observed with Pegasus-Large, which increases from 72.50% to 87.07% (+14.57%), highlighting the effectiveness of the proposed strategy.