Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-11

Noise-Guided Transport for Imitation Learning

arXiv:2509.26294v2 Announce Type: replace-cross Abstract: We consider imitation learning in the low-data regime, where only a limited number of expert demonstrations are available. In this setting, methods that rely on large-scale pretraining or high-capacity architectures can be difficult to apply, and efficiency with respect to demonstration data becomes critical. We introduce Noise-Guided Transport (NGT), a lightweight off-policy method that casts imitation as an optimal transport problem solved via adversarial training. NGT requires no pretraining or specialized architectures, incorporates uncertainty estimation by design, and is easy to implement and tune. Despite its simplicity, NGT achieves strong performance on challenging continuous control tasks, including high-dimensional Humanoid tasks, under ultra-low data regimes with as few as 20 transitions.

02.
arXiv (CS.AI) 2026-06-16

BRIDGE: Biological Evidence Refinement and Heterogeneous Dynamic Gating for Gene Regulatory Networks

arXiv:2606.14734v1 Announce Type: cross Abstract: Motivation: Gene regulatory network inference from single-cell RNA sequencing (scRNA-seq) data is important for uncovering cell-state-specific transcriptional programs. However, scRNA-seq measurements are sparse and noisy, and experimentally validated TF-target interactions remain limited, making reliable inference challenging. Although graph neural networks have advanced GRN prediction, existing methods often rely on biologically unconstrained graph augmentation, such as random edge perturbation, and insufficiently control information transfer between genes and cells. These limitations may distort regulatory structures and weaken robustness under noisy and weakly supervised settings. Results: To address these issues, we propose an innovative framework named Biological Evidence Refinement and Heterogeneous Dynamic Gating for Gene Regulatory Networks (BRIDGE). BRIDGE extracts gene and cell representations from the expression matrix and its matrix dual, and performs contrastive learning in the gene space and cell space between self and neighbors across the co-expression-refined regulatory view and the original graph. It then applies heterogeneous gated encoding to adaptively regulate information transfer between genes and cells, enabling robust transcription factor-to-target gene prediction. Experiments on benchmark datasets spanning three network types and seven cell types show that BRIDGE achieves state-of-the-art AUROC and AUPRC in most settings. In particular, on Specific networks, BRIDGE improves average AUPRC by 5% over the second-best baseline, GCLink. In cross-cell-type few-shot transfer, BRIDGE consistently outperforms GCLink and GENELink across all six target cell types. A case study on hESC further supports the biological relevance of the predictions, with 9 of the top 10 and 46 of the top 100 novel TF-target interactions validated by ChIPBase.

03.
arXiv (quant-ph) 2026-06-16

Decoherence-free algebras in quantum dynamics

arXiv:2403.12926v2 Announce Type: replace Abstract: In this Article we analyze the algebraic properties of the asymptotic dynamics of finite-dimensional open quantum systems in the Heisenberg picture. In particular, a natural product (Choi-Effros product) can be defined in the asymptotic regime. Motivated by this structure, we introduce a new space called the Choi-Effros decoherence-free algebra. Interestingly, this space is both a C*-algebra with respect to the composition product, and a B*-algebra with respect to the Choi-Effros product. Moreover, such space admits a direct-sum decomposition revealing a clear relationship with the attractor subspace of the dynamics. In particular, the equality between the attractor subspace and the Choi-Effros decoherence-free algebra is a necessary and sufficient condition for a faithful dynamics. Finally, we show how all the findings do not rely on complete positivity but on the much weaker Schwarz property.

04.
arXiv (CS.LG) 2026-06-16

Bayesian Optimization for Learning Nonlinear MPC in Autonomous Agent Navigation

arXiv:2606.14763v1 Announce Type: cross Abstract: Real-time autonomous navigation in dynamic, unknown environments remains a fundamental challenge for mobile robotics. We propose a map-free framework that tightly integrates reactive rolling-horizon planning with nonlinear Model Predictive Control (MPC). At each control cycle, a LiDAR-based Gaussian occupancy representation is constructed and used to generate collision-free trajectories via A* search, which are then tracked by a CasADi/IPOPT MPC formulation incorporating a smooth sigmoid obstacle barrier. To improve robustness to parameter sensitivity, we adopt an offline Bayesian optimization scheme based on Tree-structured Parzen Estimators (TPE), which identifies near-optimal controller parameters with respect to a composite navigation objective. In addition, a Gaussian Process surrogate is used to analyze parameter sensitivity and provide insight into the optimization landscape. The proposed framework is robot-agnostic and is evaluated on the Unitree Go2 quadruped in simulation using Gazebo, followed by deployment on the physical robot. Experimental results show that parameters tuned in simulation transfer effectively to hardware, maintaining comparable performance without additional tuning. The full system achieves up to a 90.0\% navigation success rate when deployed, along with a 38.9\% average improvement in the evaluation metrics across simulated environments.

05.
arXiv (CS.CL) 2026-06-15

SuperThoughts: Reasoning Tokens in Superposition

Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token generation, they often struggle with training stability and fail to scale to complex, long-horizon tasks due to lack of supervision signal. We propose SuperThoughts, which compresses pairs of consecutive CoT tokens into single latent representations and decodes two tokens per step via a lightweight Multi-Token Prediction (MTP) module. This preserves discrete token supervision at training time while doubling throughput at inference time. We finetune Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct, Qwen2.5-Math-14B-Instruct, and evaluate on MATH500, AMC, OlympiadBench, and GPQA-Diamond. With a confidence-based adaptive mechanism that falls back to standard decoding when uncertain, SuperThoughts achieves $\sim$20–30\% CoT length reduction while maintaining accuracy with minimal degradation (1-2 points accuracy drop on most tasks).

06.
arXiv (CS.CV) 2026-06-16

Sub-Semantic Image Segmentation

Images can be segmented based on visual cues (i.e., texture segmentation) or into objects (i.e., semantic segmentation). We propose a new category of sub-semantic image segmentation that blurs the line between the two. In sub-semantic image segmentation, language is not used to name whole objects. Instead, it is used to partition an image into stable appearance patterns that can be described by language. To do that, we couple a general-purpose vision-language model to SAM 3, a promptable segmentation backbone whose native text pathway can ground rich descriptions into masks. Simple coupling fails for a number of reasons that we identify in the paper, and we overcome them by introducing DETECTURE that resolves three concrete failure modes – language leakage between texture regions, prompt competition inside the segmentation backbone, and semantic distortion at the language-to-mask interface. Since there is no dataset of sub-semantic image segmentation, we introduce one, termed TextureADE. The new dataset is derived from the ADE20K dataset using a system we designed. We compare DETECTURE to a number of baselines and find that it achieves the strongest performance on several datasets using different metrics. Code is available at https://github.com/Scientific-Computing-Lab/TextureDetecture.

07.
arXiv (CS.CL) 2026-06-12

Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token-generation rates. To unlock this potential, we present Spiffy, a speculative decoding algorithm to accelerate dLLM inference while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding of AR-LLMs to dLLMs. Spiffy performs auto-speculation to eliminate the overheads of an independent draft model, structuring draft states in the form of a novel directed draft graph to take advantage of the bidirectional, blockwise nature of dLLM generation. These draft graphs are calibrated offline to maximize acceptance rates and are dynamically pruned during inference for improved computational efficiency. We present a detailed formulation of Spiffy and demonstrate its ability to accelerate LLaDA, Dream, and SDAR models in combination with KV caching and threshold-based dynamic unmasking leading to up to $8.6\times$ reduction in model inferences and $6.3\times$ acceleration in token rate.

08.
arXiv (CS.CL) 2026-06-18

Learning User Simulators with Turing Rewards

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maximizing the log probability or by using a similarity reward. We instead propose {Turing-RL}: a Turing-Test-based reinforcement learning approach for training user simulator models. {Turing-RL} uses a discriminative Turing reward with an LLM judge to score how indistinguishable a generated response is from the real user's given the user's history, and the user simulator LLM learns to produce responses indistinguishable from what the user could have said with such rewards. Across two different domains–conversational chat and Reddit forum discussion–we find that {Turing-RL} consistently outperforms baseline methods on both LLM and human evaluation metrics. Our study suggests that optimizing for indistinguishability, rather than response matching, is effective for learning user simulators.

09.
arXiv (CS.CL) 2026-06-12

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated task conditions. To address this gap, we introduce EvoArena, a benchmark suite that models environment changes as sequences of progressive updates across terminal, software, and social domains. We further propose EvoMem, a patch-based memory paradigm that records memory evolution as structured update histories, enabling agents to reason about environmental evolution through changes in their memory. Experiments show that current agents struggle on EvoArena, achieving an average accuracy of 39.6% across evolving terminal, software, and social-preference domains. EvoMem consistently improves performance, yielding an average gain of 1.5% on EvoArena and also improving standard benchmarks such as GAIA and LoCoMo by 6.1% and 4.8%. Beyond individual tasks, EvoMem further improves chain-level accuracy by 3.7% on EvoArena, where success requires completing a consecutive sequence of related evolutionary subtasks. Mechanistic analysis shows that EvoMem improves evidence capture in the memory, indicating better preservation of complete evolving environment states. Our results highlight the importance of modeling evolution in both evaluation and memory for reliable agent deployment.

10.
arXiv (CS.LG) 2026-06-18

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

arXiv:2606.18844v1 Announce Type: new Abstract: Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target distribution. However, because this supervision is generated via uncontrolled sampling, it provides no diagnostic insight into the model's specific errors or corrective guidance for its individual failure patterns. Consequently, the model learns to imitate a privileged distribution rather than receiving fine-grained corrections that pinpoint where and why its reasoning fails. In this paper, we propose Trajectory-Augmented Policy Optimization (TAPO), which advances self-distillation from implicit distributional alignment to explicit trajectory construction. During RL training, the model produces both correct and incorrect rollouts to the same query, and TAPO leverages this contrastive structure to construct micro-reflective corrections, new training trajectories that retain the model's erroneous reasoning up to the point of failure, then insert a natural-language diagnosis and corrected reasoning guided by a correct reference from the same sampling group. Since each trajectory is anchored in the learner's own prefix and solutions, the corrective signal preserves the model's on-policy distribution to a greater extent than the position-wise alignment imposed by KL-based methods. To integrate these trajectories, TAPO introduces difficulty-aware candidate selection at the model's capability boundary and decoupled advantage estimation to prevent gradient contamination. Experiments on AIME 2024, AIME 2025, and HMMT 2025 show that TAPO achieves consistent improvements over GRPO under the same number of training steps. Further analysis demonstrates that TAPO strengthens both first-pass reasoning and error-correction effectiveness.

11.
arXiv (CS.LG) 2026-06-11

A prior-free blind detection of information leakage from model predictions

arXiv:2606.11267v1 Announce Type: new Abstract: Data leakage – contamination of a model with information unavailable at baseline – is the dominant reproducibility failure in machine-learning-based science, yet detection tools require training code, external data, or domain expertise. None operates on the artifact an auditor most often holds: the model's output. We ask what can be decided about leakage from predictions and outcomes alone. We give a decision-theoretic framework in which leakage diagnostics are functionals of the predicted-risk/outcome law, parameterized by a threshold-weighting linked to proper scoring rules and decision-curve analysis. We prove a sharp impossibility: a recalibrated leak matching an honest model's calibration and discrimination is indistinguishable from honest performance by any function of the predictions, so the broad class is detectable only against an externally supplied ceiling on achievable discrimination. We then prove what leakage cannot hide: a near-deterministic subgroup – the signature of a near-label leak – produces a sustained unit-purity head that no legitimate predictor of a non-deterministic outcome can manufacture, yielding a prior-free test. These results organize leakage into a trichotomy – miscalibrated, broad-calibrated, and deterministic – each with a matched detector and failure mode. We validate on UK Biobank using time-windowed comorbidity leakage with known, graded severity, measuring a detection floor of $\Delta\cstar \approx 0.007$ on this endpoint, below which residual leakage is undetectable from output and too small to alter conclusions. The numerical floor is cohort- and endpoint-specific; the structural lesson is general: output-only detection fails where residual leakage is indistinguishable from an honestly stronger predictor. The test returns a verdict on a prediction vector in under a second on commodity hardware.

12.
arXiv (CS.AI) 2026-06-17

Treatment Response Optimized Clinical Decision Support AI System via Digital Twin Simulation

arXiv:2606.17405v1 Announce Type: new Abstract: Clinical decision support AI systems (CDSASs) must adapt to evolving patient conditions in real-time while adhering to strict safety constraints. We present an online adaptive framework that integrates Treatment Effect (TE) estimation to quantify clinical benefits, a patient Digital Twin (DT) to simulate treatment trajectories, and Reinforcement Learning (RL) for sequential decision-making. The AI system is initially trained on historical medical records and operates in a continuous learning loop. To ensure safety, a rule-based module monitors vital signs and blocks contraindicated treatments. Cases with strong internal model disagreement are flagged for clinician review, simulated in our experiments via a pre-trained outcome model. We validate our framework using both a synthetic clinical simulator and a real-world ovarian cancer dataset from The Cancer Genome Atlas (TCGA). In both simulated and clinical settings, our method demonstrated superior effectiveness and stability in recommending treatments compared to standard computational baselines. Furthermore, the AI system maintains low latency and requires expert consultation for only a minority of cases in our experimental validation, demonstrating its potential as a safe, clinician-supervised tool for personalized medicine that continuously improves through practical use.

13.
arXiv (CS.CL) 2026-06-12

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation

Evaluating retrieval-augmented generation (RAG) systems requires benchmarks that capture diverse question characteristics, yet practitioners lack empirical guidance on which dimensions to vary and at what granularity. We present HieraRAG, a hierarchical framework for studying granularity in RAG benchmark construction, defining optimal granularity as the level that maximizes discriminative power (the standard deviation of generation quality across categories) within a given RAG configuration. As a case study, we generate 5,872 synthetic question-answer (QA) pairs from FineWeb-10BT across 3 dimensions (Question Complexity, Answer Type, Linguistic Variation) at 3 granularity levels (2, 4, and 8 categories). With a BM25+Falcon-3-10B pipeline, optimal granularity varies by dimension: complexity benefits from fine-grained distinctions (discriminative power: 0.053) while answer type and linguistic variation peak at medium granularity. We introduce a Coherence Ratio metric to quantify whether fine-grained splits cleanly subdivide parent categories, revealing structural differences across dimensions (Question Complexity: 0.40 vs. Answer Type: 1.44). Human evaluation of 110 stratified QA pairs confirms synthetic quality. While these specific findings reflect a single configuration, HieraRAG provides a portable procedure and validation metric for practitioners to determine evaluation granularity within their own RAG settings.

14.
bioRxiv (Bioinfo) 2026-06-11

DeePEn - A Depth sensitive benchmark for Protein Engineering

Recent progress in modeling techniques and high-throughput screening has significantly enhanced the accessibility of protein engineering. Nevertheless, further progress gets hindered by the lack of robust benchmarks that capture the practical challenges for real-world protein engineering. Here, we introduced DeePEn, a Depth-sensitive benchmark for Protein Engineering that quantifies a models generalization capabilities when predicting protein fitness at increasing mutational distance from the wildtype or training data. We defined distance as the number of simultaneous point mutations, i.e., single amino acid variants (SAVs), moving from wild-type to mutant (edit distance in computer science jargon). Specifically selecting four deep mutational scanning (DMS) datasets with sufficient multi-mutation data points from ProteinGym, we assessed recent predictive models, including general and biophysics-informed protein Language Models (pLMs), and a non-transformer neural network. Our results highlight how the performance of all models deteriorates with increasing mutational distance and that no single metric sufficiently captures the diverse requirements of protein engineering. To overcome these shortcomings, DeePEn provides a readily available resource for multi-metric benchmarking that focuses on the prediction of distant variants.

15.
arXiv (CS.CV) 2026-06-17

SierpinskiCam: Camera-Controlled Video Retaking with Sierpinski Triangle Pattern Cues

Generating novel renderings of a scene along user-defined camera trajectories from a single monocular video, dubbed video retaking, is a compelling but difficult problem in content creation and visual effects. Existing geometry-guided approaches reconstruct a 4D representation from the source video and render it along the target trajectory to condition video diffusion models. However, this guidance degrades as the target camera departs from the source trajectory, leaving newly revealed regions sparse or entirely missing. We propose SierpinskiCam, which addresses this limitation by augmenting geometry-based guidance with Sierpinski dome texture cues that contains rich trackable features even under large viewpoint changes. We further introduce a reference video conditioning mechanism that appends source-video tokens to the target-token sequence and separates the two streams with negative RoPE indices, enabling appearance grounding without architectural modification or per-video adaptation. Extensive experiments show that SierpinskiCam achieves significant gains in camera controllability, geometric consistency, and video quality across diverse and challenging retaking scenarios. Project page: https://hyelinnam.github.io/SierpinskiCam/.

16.
arXiv (CS.CV) 2026-06-16

CausalDrive: Real-time Causal World Models for Autonomous Driving

World models have emerged as a promising paradigm for scaling autonomous driving (AD) data, yet existing video generative models fall short as interactive simulators. Layout-conditioned renderers rely on "oracle" future trajectories of all background agents, rendering them strictly non-reactive. Conversely, pure action-conditioned predictors lack semantic control over complex interactions and suffer from prohibitive diffusion latencies, hindering closed-loop policy learning. To bridge this gap, we present CausalDrive, a controllable, real-time foundation driving world renderer. CausalDrive operates solely on the initial front-view frame, the ego-vehicle's trajectory, and a macroscopic text prompt. By excluding future NPC layouts, we compel the model to intrinsically predict causal interactions, enabling text-driven control over Driving Sociology, allowing users to dynamically orchestrate diverse counterfactual reactions to identical ego-actions. To overcome the efficiency bottleneck and address the covariate shift in autoregressive generation, we propose a novel Context-Forced DMD architecture. This combines continuous flow-matching with a self-correcting distillation objective, achieving interactive speeds of 12 FPS. This breakthrough transforms the passive video generator into a playable neural simulator. We demonstrate its versatility across three downstream applications: (1) generative closed-loop evaluation with significantly mitigated collision artifacts, (2) large-scale Reinforcement Learning (RL) post-training driven by a Video2Reward module, and (3) real-time human-in-the-loop simulation. Extensive experiments validate that policies trained within CausalDrive's reactive scenarios exhibit superior interaction capabilities in the real world.

17.
arXiv (quant-ph) 2026-06-15

Quantum Entanglement of Bethe States

arXiv:2606.14140v1 Announce Type: cross Abstract: We investigate the quantum entanglement of Bethe states across a family of integrable spin chains, including the XXX$_{\frac{1}{2}}$ model, its higher-spin generalizations (XXX$_s$), and the non-compact $SL(2,\mathbb{R})$ chain. For on-shell eigenstates, we perform a comprehensive scan of the bipartite entanglement entropy across the entire spectrum of finite chains with periodic boundary conditions, and identify the Bethe solutions that minimize and maximize the entanglement. These extremal solutions follow systematic, spin-dependent patterns in the Bethe quantum numbers. In the XXX$_{\frac{1}{2}}$ spin chain, for the antiferromagnetic chain, the state with minimal entropy always coincides with the lowest-energy state (the ground state) within a given fixed-magnon sector. For the higher-spin XXX$_s$ model, however, the lowest-entropy state is not always identical to the ground state, and can even be the state of highest energy. By contrast, the Bethe roots that maximize entropy exhibit considerably more intricate structure. Our analysis further reveals how special Bethe root configurations, such as singular and strange solutions, affect entanglement, and it uncovers characteristic entanglement features in the non-compact $SL(2,\mathbb{R})$ chain that are absent from compact spin chains. For off-shell Bethe states, we develop an optimization algorithm that extremizes the entanglement entropy over rapidity distributions, enabling us to explore the maximum entanglement achievable by a Bethe state without imposing the Bethe ansatz equations.

18.
arXiv (CS.AI) 2026-06-12

Free-Placement Optimization of Ground Station Locations for Low-Earth Orbit Satellites

arXiv:2606.12667v1 Announce Type: cross Abstract: Rapidly expanding low Earth orbit satellite constellations are placing increasing demands on terrestrial ground networks, motivating the development of more efficient ground station network designs. Current approaches select sites from predefined locations, limiting optimization to existing infrastructure and constraining performance. In contrast, free-placement optimization operates over a continuous spatial domain on Earth, broadening the search space and allowing higher-throughput configurations at the cost of potentially requiring new infrastructure deployment. In this work, we introduce SCORE (Sequential Cyclic Optimization via Refinement & Evaluation), a two-stage free-placement method for ground station design. SCORE combines sequential coordinate selection with cyclic refinement to manage high-dimensionality, non-convexity, and local minima that challenge global optimizers. We benchmark SCORE against one-shot methods such as differential evolution (DE) and integer programming approaches using locations from Kongsberg Satellite Services and the World Teleport Association. Tests across two commercial Earth observation constellations (Capella Space and ICEYE) and one synthetic Walker-Star constellation show that SCORE requires up to 5x fewer function evaluations to converge relative to DE while improving downlink throughput by up to 13%. Compared to fixed-site methods, unconstrained SCORE achieves up to 15% greater total downlink, establishing a strong empirical performance benchmark for flexible placement; infrastructure-constrained SCORE retains over 92% of this gain while restricting placement to within proximity of existing fiber and power infrastructure. We also explore trade-offs between expanding existing stations and deploying new sites, informing future ground network design for operational constellations.

19.
arXiv (CS.CL) 2026-06-17

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large language models designed for CUDA kernel generation, spanning three parameter scales, 1.7B, 4B, and 8B. Extensive experiments on KernelBench demonstrate that DICE significantly outperforms both autoregressive and diffusion LLMs of comparable scale, establishing a new state-of-the-art for CUDA kernel generation.

20.
arXiv (CS.CV) 2026-06-15

A Lightweight Fiducial-Based Pipeline for 3D Hyperspectral Mapping of ex-vivo Lumpectomy Specimens

Hyperspectral Imaging (HSI) is a promising modality for intraoperative assessment of resection margins in Breast-Conserving Surgery (BCS), but its clinical translation requires aligning the inherently 2D spectral information onto the 3D shape of the excised tissue so that suspicious regions can be precisely localized for targeted follow-up. We present a fully automated, calibration-free pipeline that produces a 3D hyperspectral point cloud of an ex-vivo lumpectomy specimen from a set of consumer-camera RGB images and a single top-down HSI acquisition. The 3D geometry is reconstructed with a deep-learning Structure-from-Motion backbone, stabilized in a metric reference frame by a custom bundle adjustment that enforces consistency on the corners of four ArUco markers placed around the specimen. The HSI cube is then registered to the reconstruction without recovering the HSI camera pose: the markers, visible in both modalities, define 16 corner correspondences that drive a planar homography, and 3D coordinates are recovered by lookup on an orthographically rendered depth map. Evaluated on two ex-vivo lumpectomy specimens, the pipeline achieves a median 3D registration error below 1~mm and a 2D reprojection error below 0.02 mm, with a total per-specimen processing time under 4 minutes on accelerated hardware. These results support the feasibility of integrating HSI-guided spatial localization into intraoperative margin assessment workflows for breast-conserving surgery.

21.
arXiv (math.PR) 2026-06-18

Milstein-type Schemes for Hyperbolic SPDEs

arXiv:2512.19647v4 Announce Type: replace-cross Abstract: This article studies the temporal approximation of hyperbolic semilinear stochastic evolution equations with multiplicative Gaussian noise by Milstein-type schemes. We take the term hyperbolic to mean that the leading operator generates a contractive, not necessarily analytic $C_0$-semigroup. Optimal convergence rates are derived for the pathwise uniform strong error \[ E_h^\infty := \Big(\mathbb{E}\Big[\max_{1\le j \le M}\|U_{t_j}-u_j\|_X^p\Big]\Big)^{1/p} \] on a Hilbert space $X$ for $p\in [2,\infty)$. Here, $U$ is the mild solution and $u_j$ its Milstein approximation at time $t_j=jh$ with step size $h>0$ and final time $T=Mh>0$. For sufficiently regular nonlinearity and noise, we establish strong convergence of order one, with the error satisfying $E_h^\infty\lesssim h\sqrt{\log(T/h)}$ for rational Milstein schemes and $E_h^\infty \lesssim h$ for exponential Milstein schemes. This extends previous results from parabolic to hyperbolic SPDEs and from exponential to rational Milstein schemes. Moreover, root-mean-square error estimates are strengthened to pathwise uniform estimates. Numerical experiments validate the convergence rates for the stochastic Schrödinger equation. Further applications to Maxwell's and transport equations are included.

22.
medRxiv (Medicine) 2026-06-15

Association of Genetic Liability to Psychiatric Disorders with Peripheral Metabolic Dysregulation

Importance: Individuals with psychiatric disorders face elevated cardiometabolic risk which is linked to increased mortality. The extent to which this reflects shared pathogenesis or the downstream effects of illness and treatment remains poorly understood. Objective: To characterize the direct pleiotropic effects of psychiatric genetic liability on circulating metabolites and aggregate cardiometabolic risk, independent of psychiatric diagnosis and psychotropic medication use. Design: Cohort study. Setting: Mass General Brigham Biobank (MGBB). Participants: MGBB participants with metabolomic profiling, genomic data, and linked electronic health records. Exposures: Genetic liability to nine psychiatric disorders quantified using polygenic risk scores (PRS): attention deficit/hyperactivity disorder (ADHD), anorexia nervosa (ANO), anxiety disorder (ANX), autism spectrum disorder (ASD), bipolar disorder (BD), major depressive disorder (MDD), PTSD, schizophrenia (SCZ), and substance use disorder (SUD). Main Outcomes and Measures: 249 circulating metabolites and four metabolomic risk scores (MRS) for type 2 diabetes, myocardial infarction, ischemic stroke, and vascular dementia. PRS-metabolite associations were estimated using nested models adjusting for lifetime psychiatric diagnosis and psychotropic medication use. Results: Across 25,290 participants, we identified 604 significant PRS-metabolite associations (Bonferroni p< 1.36 x 10-4), of which 89% persisted after adjustment for lifetime diagnosis and medication use, suggesting that the direct genetic effects on metabolism are largely independent of illness or treatment. PRS for MDD, PTSD, and ADHD showed the most extensive dysregulation, with a transdiagnostic pattern of elevated lipids and systemic inflammation, specifically triglycerides ({beta} = 0.04 to 0.05, all p< 4.4 x10-13) and glycoprotein acetyls ({beta} = 0.05, all p< 2.2 x10-16). Notably, PRS for SCZ and BD showed minimal metabolite dysregulation despite having the strongest association with their target diagnoses. PRS for MDD, PTSD, ADHD, and SUD were associated with increased MRS across cardiometabolic conditions ({beta} = 0.03 to 0.08, all p< 2.1 x10-4). Sensitivity analyses controlling for BMI or excluding participants without any psychiatric history (N: 21,305 and 11,150, respectively) showed a similar pattern. Conclusions and Relevance: Psychiatric genetic liability is associated with systemic metabolic dysregulation independent of illness onset or treatment, supporting a partially pleiotropic basis for psychiatric-cardiometabolic comorbidity.

23.
medRxiv (Medicine) 2026-06-10

A Three-Tier Operational Benchmark for Evaluating Large Language Models on Hospital Medication Safety

Objective. To introduce PsiBench, a clinically validated medication-safety benchmark for evaluating large language models (LLMs) against the standards used to certify hospital computerized provider order entry (CPOE) and electronic health record (EHR) systems, and a non-overlapping three-tier evaluation framework separating highest-stakes discrimination, the operational CDS regime, and category-correct alerting. Materials and Methods. PsiBench comprises 492 medication-safety scenarios across 11 safety categories, created by clinical pharmacology experts whose work underpins an annualized testing procedure used by more than 2,000 U.S. hospitals. The three-tier framework partitions the scenarios non-overlappingly: Discrimination (98 scenarios, 50 fatal vs 48 deception, near-balanced 51%/49%); Operational (394 scenarios, 261 serious unsafe plus 133 safe including 41 Excessive Alerts reclassified as operational negatives); and Attribution (311 alert-required scenarios). We evaluated 40 frontier LLMs from 10 providers over 3 runs per scenario at temperature 0.2 (or the provider default where temperature is not configurable), yielding 59,040 evaluations conducted April 21-23, 2026. Results. Headline binary performance on the full benchmark spans a wide range across the 40 models: F1 78.5%-92.3%, accuracy 65.4%-89.8%, sensitivity 81.4%-100.0%, specificity 6.1%-81.8%. Leading models by F1 (o4-mini 92.3%; o3 92.2%) pair high sensitivity with meaningful specificity; three models saturate sensitivity at 100% but fall below 25% specificity, indistinguishable from a naive always-alert classifier. The wide spread on a single headline metric motivates tier-specific analyses, developed in a separate clinical paper. Discussion and Conclusion. PsiBench and the three-tier framework operationalize a rigorous evaluation rubric for LLM medication safety, grounded in two decades of national hospital audit experience. The framework generalizes to any binary medication-safety classifier (rule-based, conventional ML, or LLM-driven), supporting tier-aware model selection and post-deployment surveillance.

24.
arXiv (quant-ph) 2026-06-11

Handbook of Error-Correcting Codes

arXiv:2606.11484v1 Announce Type: new Abstract: Barcode scans, clear phone calls, reliable data storage, satellite communication, and large-scale quantum computation are all made possible by error correction. We present a handbook version of The Error Correction Zoo, a curated reference of methods for protecting classical or quantum information from errors during storage and transmission. The handbook includes descriptions of these error-correcting codes and a classification according to the symbols they use. It also catalogues relations among codes and related objects such as sphere packings, lattices, designs, groups, and classical and quantum phases of matter. The collection is intended both as a rigorous reference and as a practical aid for tracing the web of code relationships and uncovering new connections.

25.
arXiv (math.PR) 2026-06-11

Asymptotic analysis of the finite predictor for fractional Gaussian noise

arXiv:2504.01562v2 Announce Type: replace-cross Abstract: This paper proposes a new approach to the asymptotic analysis of the finite predictor for stationary sequences. Our method yields the exact asymptotics of both the relative prediction error and the partial correlation coefficients. The underlying assumptions are analytic in nature, making the approach applicable to processes with long-range dependence. The ARMA-type process driven by fractional Gaussian noise (fGn), which had previously remained elusive, is used as a case study.