Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-25

Tensor network characterization and mitigation of readout errors

arXiv:2606.25974v1 Announce Type: new Abstract: Readout errors are a major bottleneck to extracting reliable information from near-term quantum processors, especially when spatial correlations are non-negligible. We present a unified tensor-network framework that models the readout process as a matrix product operator (MPO), enabling efficient characterization and mitigation beyond uncorrelated approximations. The MPO model is trained via likelihood optimization on calibration data and applies to multiple tasks, including nonlocal observable estimation, random circuit sampling, and random-measurement protocols, such as classical shadows and learning-based tomography. Experiments on a superconducting processor and numerical simulations up to 20 qubits show that the MPO model captures correlated readout errors that uncorrelated models miss, with a sample cost that grows only near-linearly with system size. When extended to two-dimensional systems, the framework can also be integrated with tensor-network quantum error-correction decoders by performing joint inference over data and readout errors. These results establish tensor-network readout error mitigation as a scalable and versatile approach for noise-aware quantum data processing.

02.
arXiv (CS.CV) 2026-06-16

Transformation-driven generation of comparable projection images from multimodal anatomical scenes

This work addresses the computational problem of generating reproducible projection-space observations from heterogeneous anatomical scenes whose components may undergo independent spatial transformations. We propose a transformation-driven framework for synthetic projection imaging from multimodal anatomical data and demonstrate it on mandibular-motion scenarios. In contrast to conventional Digitally Reconstructed Radiograph (DRR) approaches primarily designed for registration, projection realism, or rendering efficiency, the proposed formulation treats projection imaging as an observation process operating on an explicitly represented anatomical scene. Independently transformable volumetric and surface-based anatomical objects are embedded within a shared scene representation and propagated directly into projection space through explicit transformations. Projection geometry, acquisition modelling, material interpretation, and image presentation remain explicitly separated, enabling controlled exploration of methodological assumptions while preserving reproducibility and direct comparability between generated projections. Particular emphasis is placed on transformation-driven anatomical scenarios relevant to craniofacial analysis, including mandibular motion and therapeutic repositioning. Using a shared anatomical reference scene composed of CT/CBCT volumes, segmented structures, surface models, and auxiliary anatomical or therapeutic objects, the framework enables generation of directly comparable VirtualRTG projections from multiple anatomical configurations while preserving identical imaging assumptions. Rather than aiming at fully physically faithful radiographic simulation, the proposed approach provides a controllable and reproducible methodological environment for studying anatomy–projection relationships, motion observability, and transformation-aware imaging workflows.

03.
arXiv (CS.LG) 2026-06-12

Data-driven Lake Water Quality Forecasting for Time Series with Missing Data using Machine Learning

arXiv:2601.15503v2 Announce Type: replace Abstract: Volunteer-led lake monitoring yields irregular, seasonal time series with many gaps arising from ice cover, weather-related access constraints, and occasional human errors, complicating forecasting and early warning of harmful algal blooms. We study Secchi Disk Depth (SDD) forecasting on a 30-lake, data-rich subset drawn from three decades of in-situ records collected across Maine lakes. Missingness is handled via Multiple Imputation by Chained Equations (MICE), and we evaluate performance with a normalized Mean Absolute Error (nMAE) metric for cross-lake comparability. Among six candidates, ridge regression provides the best mean test performance. Using ridge regression, we then quantify the minimal sample size, showing that under a backward, recent-history protocol, the model reaches within 5% of full-history accuracy with approximately 176 training samples per lake on average. We also identify a minimal feature set, where a compact four-feature subset matches the thirteen-feature baseline within the same 5% tolerance. Bringing these results together, we introduce a joint feasibility function that identifies the minimal training history and fewest predictors sufficient to achieve the target of staying within 5% of the complete-history, full-feature baseline. In our study, meeting the 5% accuracy target required about 64 recent samples and just one predictor per lake, highlighting the practicality of targeted monitoring. Hence, our joint feasibility strategy unifies recent-history length and feature choice under a fixed accuracy target, yielding a simple, efficient rule for setting sampling effort and measurement priorities for lake researchers.

04.
arXiv (CS.LG) 2026-06-16

The Complexity of Min-Max Optimization for Quadratic Polynomials

arXiv:2606.17000v1 Announce Type: cross Abstract: We prove that computing approximate stationary points of min-max optimization over the hypercube is PPAD-hard for quadratic polynomials. This holds even when the polynomials are multilinear, each variable appears in at most three monomials, and the approximation factor is inverse polynomial. As a direct consequence, we obtain the first PPAD-hardness results for two-team zero-sum polymatrix games.

05.
arXiv (CS.CL) 2026-06-11

RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation

On-policy self-distillation (OPSD) provides dense, token-level supervision for reasoning models by aligning a model's own distribution with the distribution it produces under privileged context, typically a verified solution. However, we show that the learning signal drawn from this distributional gap concentrates on style tokens rather than task-bearing ones, as the hinted model tends to produce more direct, shorter outputs. We term this pathology privilege-induced style drift, which destabilizes training or causes response length to shrink. To address this, we propose RLCSD (Reinforcement Learning with Contrastive on-policy Self-Distillation), which mitigates this drift by contrasting the teacher-student gap under a correct hint against that under a wrong hint, suppressing the style shift that conditioning on a hint tends to induce regardless of correctness, and yielding a signal that is more concentrated on task-bearing tokens. Experiments on Qwen3 (1.7B/4B/8B) and Olmo-3-7B-Think across mathematical and logical reasoning show that RLCSD consistently outperforms GRPO and prior OPSD methods. We further show that the contrastive principle is general: it plugs into existing OPSD methods to improve them, and its underlying insight extends to the broader cross-model on-policy distillation setting.

06.
arXiv (CS.AI) 2026-06-17

Sustainable Metal-Organic Framework Water Harvesters in the Artificial Intelligence Era

arXiv:2605.29179v2 Announce Type: replace-cross Abstract: Metal-organic frameworks (MOFs) are excellent candidates for water harvesting due to their tunable pore environments, which can be precisely engineered to capture and release water in arid conditions. Integrating artificial intelligence (AI) into MOF discovery can further accelerate the design of high-performance sorbents by identifying structural features that enhance atmospheric water harvesting (AWH), stability, and cycling efficiency. In this Perspective, we examine key MOF design principles, including cooperative adsorption, operational relative humidity (RH), uptake capacity, hysteresis, and scalability. We highlight recent design advancements such as multivariate strategies and long-arm linker extension, and examine how these principles tune pore capacity and hydrophilicity, while preserving stability and crystallinity. Furthermore, we discuss how AI, large language models (LLMs), and data mining can accelerate the discovery process through predictive synthesis, inverse design, and elucidating synthesis-structure-property relationships for the next generation of MOF water harvesters.

07.
arXiv (CS.LG) 2026-06-19

GB-LSR: A Fast Local Spectral Image Representation with a Single Global Bandwidth for Continuous Reconstruction and Super-Resolution

arXiv:2606.19617v1 Announce Type: cross Abstract: We present GB-LSR (Global-Bandwidth Local Spectral Representation), a fixed-grid local spectral representation for continuous image reconstruction. The image domain is partitioned into non-overlapping square patches, each carrying coefficients for a truncated Fourier basis predicted from shared convolutional-encoder features. A single trainable scalar bandwidth is shared globally across all patches and images, and reconstruction at any continuous coordinate is a fixed-size basis contraction whose cost is independent of image size. We study three bandwidth-handling variants: a trainable global scalar (main), a fixed global scalar, and a per-patch bandwidth field. On a standardized native-reconstruction benchmark across Kodak, Set14, and Urban100, the main variant outperforms matched-budget amortized LIIF / LTE / WIRE re-implementations by 2.8-3.6 dB PSNR and 0.11-0.15 LPIPS, while running at roughly one-quarter of the slowest baseline's inference cost. The single global scalar suffices empirically: per-patch adaptive-bandwidth alternatives do not improve over it on either a closed-form locality diagnostic or an end-to-end ablation. In a separate arbitrary-scale super-resolution (ASR) extension, GB-LSR achieves competitive PSNR-Y under a canonical-style SR protocol and runs 1.44x faster than LIIF-RDN and 3.25x faster than LTE-SwinIR at x4; within the same extension, a variant trained and evaluated without 4-corner local-ensemble averaging gives a 1.77x speedup with 35% lower peak memory and negligible PSNR change, while additionally widening the RDN encoder from 64 to 96 channels gives a small positive PSNR shift with a 1.58x speedup and 31% lower peak memory. Native-reconstruction claims are scoped to the matched-budget amortized protocol, and ASR claims are scoped to a separate canonical-style SR protocol.

08.
arXiv (CS.CV) 2026-06-16

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may become outdated or invalid. To address this, we propose PermaVid, a novel framework built upon a multi-modal context memory that disentangles spatial context into semantic appearance and geometric structure, together with an edit-aware memory update and retrieval strategy that keeps memory evolution aligned with subsequent observations. Specifically, we develop two complementary memory banks: an RGB context memory that captures appearance-aware observations while implicitly encoding geometry, and a depth context memory that preserves geometry-only structure disentangled from semantics. Building on this design, we introduce a memory-guided video generation model that performs multi-modal feature fusion under reference conditions drawn from mixed-modality memory contexts. Experiments demonstrate that our method maintains strong long-term semantic and structural consistency after edits, significantly outperforming state-of-the-art methods.

09.
arXiv (CS.CV) 2026-06-15

A Unified Theory of Sinusoidal Activation Families for Implicit Neural Representations

Implicit Neural Representations (INRs) model continuous signals with compact neural networks and have become a standard tool in vision, graphics, and signal processing. A central challenge is accurately capturing fine detail without heavy hand-crafted encodings or brittle training heuristics. Across the literature, periodic activations have emerged as a compelling remedy: from SIREN, which uses a single sinusoid with a fixed global frequency, to more recent architectures employing multiple sinusoids and, in some cases, trainable frequencies and phases. We study this family of sinusoidal activations and develop a principled theoretical and practical framework for trainable sinusoidal activations in INRs. Concretely, we instantiate this framework with Sinusoidal Trainable Activation Functions (STAF), a Fourier-like activation whose amplitudes, frequencies, and phases are learned. Our analysis (i) establishes a Kronecker-equivalence construction that expresses trainable sinusoidal activations with standard sine networks and quantifies expressive growth, (ii) characterizes how the Neural Tangent Kernel (NTK) spectrum changes under trainable sinusoidal parameterization, and (iii) provides an initialization that yields standard normal post-activations without asymptotic central limit theorem (CLT) arguments. Empirically, on images, audio, shapes, inverse problems (super-resolution, denoising) and NeRF, STAF is competitive and often stronger on distortion-oriented reconstruction metrics such as PSNR/SSIM across the evaluated INR tasks, with favorable parameter efficiency under layer-wise sharing. While periodic activations can alleviate practical manifestations of spectral bias, our results indicate they do not eliminate it; instead, trainable sinusoids can improve the observed capacity-optimization trade-off in the evaluated settings.

10.
medRxiv (Medicine) 2026-06-22

Early-life nutritional environment is associated with late-life cognition in the Health and Retirement Study, a pellagra epidemic natural experiment

Early-life exposures are important to several late-life health outcomes. We sought to study the effect of an in utero nutritional environment and its interaction with Alzheimer's disease (AD) genetic risk on late-life cognitive function. We used a natural experiment created by the pellagra epidemic, a nutritional disease caused by a vitamin B3 deficiency, to evaluate the association between in utero pellagra epidemic exposure and late-life cognitive function in the Health and Retirement Study (N = 18,285). We also evaluated whether the in utero exposure could modify the AD polygenic score's (PGS) effect on cognition. In utero pellagra epidemic exposure was significantly associated with cognition ({beta} = -0.025). However, these effects were not isolated to the prenatal period as exposure during childhood periods also had an effect. The interaction between the in utero exposure and the AD PGS was significant, where the genetic effect on cognition was amplified with increasing (progressively worse) in utero exposure levels. These associations imply that the early-life nutritional environment affects late-life cognitive function and that these effects can modify genetic risk.

11.
arXiv (CS.CV) 2026-06-16

CASHEW: Stabilizing Multimodal Reasoning via Iterative Trajectory Aggregation

Vision-language models achieve strong performance across a wide range of multimodal understanding and reasoning tasks, yet their multi-step reasoning remains unstable. Repeated sampling over the same input often produces divergent reasoning trajectories and inconsistent final predictions. To address this, we introduce two complementary approaches inspired by test-time scaling: (1) CASHEW, an inference-time framework that stabilizes reasoning by iteratively aggregating multiple candidate trajectories into higher-quality reasoning traces, with explicit visual verification filtering hallucinated steps and grounding reasoning in visual evidence, and (2) CASHEW-RL, a learned variant that internalizes this aggregation behavior within a single model. CASHEW-RL is trained using Group Sequence Policy Optimization (GSPO) with a composite reward that encourages correct answers grounded in minimal yet sufficient visual evidence, while adaptively allocating reasoning effort based on task difficulty. This training objective enables robust self-aggregation at inference. Extensive experiments on 13 image understanding, video understanding, and video reasoning benchmarks show significant performance improvements, including gains of up to +26.2 percentage points on ScienceQA and +9.1 percentage points on EgoSchema.

12.
arXiv (quant-ph) 2026-06-11

Recirculating Quantum Photonic Networks for Fast Deterministic Quantum Information Processing

arXiv:2602.11033v2 Announce Type: replace Abstract: A fundamental challenge in photonics-based deterministic quantum information processing is to realize key transformations on time scales shorter than those of detrimental decoherence and loss mechanisms. This challenge has been addressed through device-focused approaches that aim to increase nonlinear interactions relative to decoherence rates. In this work, we adopt a complementary architecture-focused approach by proposing a recirculating quantum photonic network (RQPN) that minimizes the duration of quantum information processing tasks, thereby reducing the requirements on nonlinear interaction rates. The RQPN consists of a network of all-to-all connected nonlinear cavities with dynamically controlled waveguide couplings, and it processes information by capturing a photonic input state, recirculating photons between the cavities, and releasing a photonic output state. We demonstrate the RQPN's architectural advantage through two examples: first, we show that processing all qubits simultaneously yields faster operations than single- and two-qubit decompositions of the three-qubit Toffoli gate. Second, we demonstrate implementations of a measurement-free correction for single-photon loss, achieving up to seven-fold speedups and significantly improved hardware efficiency relative to state-of-the-art architecture proposals. Our work shows that a single hardware-efficient recirculating architecture substantially reduces the temporal overhead of multi-qubit gates and quantum error correction, thereby lowering the barrier to experimental realizations of deterministic photonic quantum information processing.

13.
bioRxiv (Bioinfo) 2026-06-11

Pillbox: A Leakage-Aware Foundation-Model Predictor and Lineage-Ceiling Diagnostic for Cancer Drug Response

We present Pillbox, a predictor whose pipeline is audited against the six Asiaee leakage modes with the one residual pathway shown by per-fold ablation to be non-load-bearing on hard splits. Our model combines CpGPT methylation embeddings, CLAMP drug embeddings, and per-fold-fit gene-expression principal components which are fused by Feature-wise Linear Modulation (FiLM)-conditioned graph attention on the STRING v12 protein-protein interaction graph. Then we alpha-ensemble the model against a histogram-based gradient boosting regressor baseline. On GDSC GSE68379 (987 cell lines, 375 drugs) across seeds 42, 7, and 123, the ensemble reaches test R-Squared of 0.78, 0.77, and 0.76 on random, histology-blind, and site-blind splits respectively, with cell-aware lifts above the drug-mean floor of +0.054, +0.060, and +0.037. As a quantitative diagnostic for feature-stack saturation we propose the cross-architecture residual correlation, calibrated against a same-architecture-different-initialization control. On histology-blind splits the cross-architecture value of 0.939 falls short of the same-architecture ceiling of 0.974 by approximately 0.03 in residual correlation, a gap we interpret as the headroom available to architecture choice on top of the current foundation-model representation and consistent with the long-established observation that tissue lineage dominates cell-line drug response. We integrated curated mutation, methylation, and drug-target-expression channels, but these do not improve prediction once foundation-model embeddings are in place. Cross-screen validation against PRISM matches the GDSC-to-PRISM measurement reproducibility ceiling within 0.01 Spearman.

14.
arXiv (quant-ph) 2026-06-25

Forced Gap Post-Selection for Quantum LDPC Codes and their Operations

arXiv:2605.20346v2 Announce Type: replace Abstract: We develop a simple and general post-selection strategy for high-rate quantum codes that is transferrable across decoders. After an initial baseline run, the decoder is re-run once per logical observable, and forced in these latter runs to provide a solution where the given observable has the complementary outcome. Shots are rejected that find logically complementary solutions with similar likelihoods compared to the baseline. Using the Relay-BP decoder, we benchmark the strategy on the $72$-qubit and $144$-qubit bivariate bicycle codes, as well as surgery gadgets for the latter. In comparison to previous post-selection strategies, our results offer an improved logical error rate by over a factor of $4$ on the same circuit and physical error rate, and at the same rate of post-selection. Our strategies are also lightweight, relying only on FPGA-friendly belief propagation, whereas the previous best used repeated rounds of a high-latency BP-OSD decoder.

15.
arXiv (CS.LG) 2026-06-16

Simulation-Augmented Multi-Step Split Conformal Prediction for Aggregated Forecasts

arXiv:2606.16356v1 Announce Type: new Abstract: We study uncertainty quantification for aggregated forecasting tasks such as annual totals and year-over-year growth rates. We propose SA-MSCP, a simulation-augmented multi-step split conformal method that generates future paths from cross-validated residuals using a block bootstrap and constructs prediction intervals from empirical quantiles. Experiments show that SA-MSCP improves empirical coverage over a simulated-path baseline for aggregated and growth-rate targets. Our results demonstrate that simulation-enhanced conformal calibration is an effective and general framework for uncertainty quantification in aggregated time-series forecasting.

16.
arXiv (CS.LG) 2026-06-24

A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics

arXiv:2504.01482v3 Announce Type: replace-cross Abstract: This paper develops a model-based framework for continuous-time policy evaluation (CTPE) in reinforcement learning, incorporating both Brownian and Lévy noise to model stochastic dynamics influenced by rare and extreme events. Our approach formulates the policy evaluation problem as solving a partial integro-differential equation (PIDE) for the value function with unknown coefficients. A key challenge in this setting is accurately recovering the unknown coefficients in the stochastic dynamics, particularly when driven by Lévy processes with heavy tail effects. To address this, we propose a robust numerical approach that effectively handles both unbiased and censored trajectory datasets. This method combines maximum likelihood estimation with an iterative tail correction mechanism, improving the stability and accuracy of coefficient recovery. Additionally, we establish a theoretical bound for the policy evaluation error based on coefficient recovery error. Through numerical experiments, including a real-data BTC price experiment, we demonstrate the effectiveness and robustness of our method in recovering heavy-tailed Lévy dynamics and verify the theoretical error analysis in policy evaluation.

17.
arXiv (CS.LG) 2026-06-15

Neural ARFIMA model for forecasting BRIC exchange rates with long memory

arXiv:2509.06697v3 Announce Type: replace-cross Abstract: Exchange rate forecasting remains a challenging problem, particularly for emerging economies, where the observed time series exhibit pronounced long-memory dependence, nonlinear dynamics, and sensitivity to macro-financial drivers. Classical models such as ARFIMA capture long-range persistence but fail to adequately represent nonlinear relationships, while modern machine learning approaches often neglect the underlying long-memory structure in macroeconomic series. To address this gap, we propose a Neural AutoRegressive Fractionally Integrated Moving Average (NARFIMA) model that integrates ARFIMA-based long-memory modeling with neural networks for nonlinear function approximation, while incorporating exogenous macroeconomic and uncertainty indicators. The framework provides a unified approach for capturing persistence, nonlinear dynamics, and external shocks. We establish asymptotic stationarity of the NARFIMA process and develop conformal prediction intervals for distribution-free uncertainty quantification. Empirical results for BRIC exchange rates show that NARFIMA consistently outperforms a broad range of forecasting benchmarks across multiple horizons, underscoring the importance of explicitly modeling long-memory dependence in exchange rate dynamics. The `narfima' R package provides an implementation of our approach.

18.
arXiv (CS.CL) 2026-06-25

Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval

Foundation-model agents in multi-step, open-ended environments frequently suffer from compounding errors, where early mistakes contaminate long-horizon trajectories. While Multi-Agent Debate (MAD) succeeds in deterministic domains, agents in subjective tasks like persuasion experience severe problem drift and sycophantic conformity. We identify semantic leakage in standard Retrieval-Augmented Generation (RAG) as a reproducible trigger for these failures, as standard RAG prioritizes vocabulary overlap over logical necessity. To eliminate this leakage, we introduce Taxonomic Strategy RAG (TS-RAG), a systems intervention that routes strategies through a discrete categorical bottleneck to decouple argumentative structure from topical content. Zero-shot, cross-domain evaluations demonstrate that TS-RAG significantly improves the transfer of abstract logic where standard semantic retrieval collapses. Crucially, TS-RAG acts as a "capability bridge" in asymmetric deployments, empowering lightweight persuaders to consistently defeat parametrically superior opponents (improving win rates from 70.5 to 78.5) and accelerating argumentative efficiency. Finally, we introduce trace-level diagnostics via a turn-by-turn Debate State Representation (DSR), demonstrating the necessity of strict constraints to prevent evaluation collapse via default agentic sycophancy.

19.
arXiv (CS.AI) 2026-06-18

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

arXiv:2604.13082v2 Announce Type: replace-cross Abstract: Grokking in transformers trained on algorithmic tasks is characterized by a long delay between training-set fit and abrupt generalization, but the source of that delay remains poorly understood. In encoder-decoder arithmetic models, we argue that this delay reflects limited access to already learned structure rather than failure to acquire that structure in the first place. We study one-step Collatz prediction and find that the encoder organizes parity and residue structure within the first few thousand training steps, while output accuracy remains near chance for tens of thousands more. Causal interventions support the decoder bottleneck hypothesis. Transplanting a trained encoder into a fresh model accelerates grokking by 2.75 times, while transplanting a trained decoder actively hurts. Freezing a converged encoder and retraining only the decoder eliminates the plateau entirely and yields 97.6% accuracy, compared to 86.1% for joint training. What makes the decoder's job harder or easier depends on numeral representation. Across 15 bases, those whose factorization aligns with the Collatz map's arithmetic (e.g., base 24) reach 99.8% accuracy, while binary fails completely because its representations collapse and never recover. The choice of base acts as an inductive bias that controls how much local digit structure the decoder can exploit, producing large differences in learnability from the same underlying task.

20.
arXiv (CS.CL) 2026-06-24

Selective Rotary Position Embedding

Position information is essential for language modeling. In softmax transformers, Rotary Position Embeddings (RoPE) encode positions through fixed-angle rotations, while in linear transformers, order is handled via input-dependent (selective) gating that decays past key-value associations. Selectivity has generally been shown to improve language-related tasks. Inspired by this, we introduce Selective RoPE, an input-dependent rotary embedding mechanism, that generalizes RoPE, and enables rotation in arbitrary angles for both linear and softmax transformers. We show that softmax attention already performs a hidden form of these rotations on query-key pairs, uncovering an implicit positional structure. We further show that in state-space models and gated linear transformers, the real part manages forgetting while the imaginary part encodes positions through rotations. We validate our method by equipping gated transformers with Selective RoPE, demonstrating that its input-dependent rotations improve performance in language modeling and on difficult sequence tasks like copying, state tracking, and retrieval.

21.
medRxiv (Medicine) 2026-06-24

An Automated, Pathologist-free Gleason Grade Stratifies Disease-free Interval Comparably to Expert Grading from a Single Out-of-distribution Slide

Automated Gleason grading now matches expert pathologists on the cohorts where systems are developed and tuned, but deployment-relevant gaps remain: whether an automated grade, applied without site-specific tuning or pathologist oversight, stratifies outcome comparably to expert grading on slides from unseen institutions and in cross-specimen applications. We tested this for disease-free interval (DFI), a curated recurrence endpoint. A production gland-level prostate diagnostic (PathTools Prostate v11.0) was applied frozen and uncalibrated to 298 diagnostic whole-slide images from 274 TCGA-PRAD radical-prostatectomy patients, a cohort outside its development distribution and needle-core-biopsy training data, contributed by 25 source sites under heterogeneous digitization; tissue was detected automatically with no expert region annotation. From the output we derived an ISUP grade group and continuous high-grade content, and evaluated each grade as a standalone predictor of DFI (24 events) by Harrell's c-index with 95% bootstrap confidence intervals, a paired between-method bootstrap, and Kaplan-Meier curves with the log-rank test. The automated grade reproduced the clinical grade group at quadratic-weighted kappa = 0.62 (95% CI 0.53-0.70; 48% exact, 86% within one group), within the expert inter-observer range. As the sole predictor it stratified recurrence (log-rank p = 0.022; c-index 0.69, 95% CI 0.58-0.79), and the continuous high-grade fraction was robustly prognostic (hazard ratio 1.37 per SD, p = 0.029; c-index 0.71, 0.61-0.81). Standalone discrimination was not statistically separable from the clinical grade (c-index 0.78, 0.69-0.86; paired {triangleup} c-index spanning zero), and in a joint model the automated grade added nothing beyond it, consistent with both measuring a shared morphological axis. From a single out-of-distribution slide with no pathologist oversight, the automated grade provides standalone recurrence stratification not statistically separable from whole-gland expert grading, demonstrating robust generalizability beyond training data; reported as a continuous high-grade fraction, it offers reproducible, expert-free, grade-equivalent risk stratification for harmonizing large archival or genomically-profiled cohorts.

22.
arXiv (CS.LG) 2026-06-15

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

arXiv:2603.18464v3 Announce Type: replace Abstract: Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models is severely bottlenecked by synchronization barriers and the high cost of environment data acquisition. To overcome these challenges, we propose AcceRL, a distributed asynchronous RL framework that physically isolates environment rollouts, model inference, and gradient updates. By eliminating the cascading long-tail idle bubbles inherent in synchronous systems, AcceRL maximizes hardware utilization and ensures scalable throughput. Furthermore, AcceRL features a modular design that supports the integration of diverse, plug-and-play world models into its distributed pipeline. Extensive experiments demonstrate that the base framework achieves highly competitive performance across all four LIBERO[liu2023libero] task suites. Systematically, the asynchronous architecture delivers a $2.4\times$ throughput speedup over leading synchronous baselines. Algorithmically, by leveraging a world model pre-trained on 1,000 offline trajectories, AcceRL achieves up to a $200\times$ improvement in online sample efficiency on LIBERO-Spatial, establishing a robust framework that is both sample-efficient and time-efficient for embodied AI. Code is included in the supplementary material. Code is available at https://github.com/distanceLu/AcceRL.

23.
arXiv (CS.LG) 2026-06-19

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption

arXiv:2605.30089v2 Announce Type: replace Abstract: Standard Set Representation Learning methods typically excel on curated data but often overlook the challenge of inference-time element corruption. This refers to scenarios where deployed models encounter element-level degradations, such as outliers or missing components, that may distort set representation and degrade performance. We propose SW-DRSO, a distributionally robust optimization framework tailored for sets. Rather than minimizing loss solely on observed training data, SW-DRSO optimizes a tractable surrogate of the worst-case expected loss over a family of plausible inference-time variations. We introduce a barycentric adversary that approximates the intractable search over corrupted sets by a differentiable training-time optimization over simplex weights. Extensive experiments across four tasks demonstrate that SW-DRSO effectively enhances robustness against corruption while maintaining high overall performance.

24.
medRxiv (Medicine) 2026-06-17

Cross-Device Adaptation of Mirai for Mammography-Based Breast Cancer Risk Prediction

Fine-tuning can adapt pretrained medical imaging models to new clinical datasets, but device-specific domain shifts may limit generalizability. We evaluated Mirai, a mammography-based deep learning model for breast cancer risk prediction, in a large screening cohort containing Hologic and General Electric (GE) full-field digital mammography systems, including GE Premium View (GE PV) and Tissue Equalization (GE TE) post-processing software. Native Mirai showed lower performance on TE images than on Hologic or PV images. Fine-tuning on TE images improved TE performance, particularly for short-term risk prediction, but substantially reduced performance on Hologic images, consistent with catastrophic forgetting. To mitigate this effect, we developed a device-invariant model using interleaved multi-device sampling and conditional adversarial training. This approach largely restored Hologic performance while maintaining improved TE performance, providing better robustness across heterogeneous imaging platforms. Comparison of cumulative and annual risk AUCs over a five-year time horizon further showed that performance gains were driven mainly by short- and intermediate-term predictions. These findings highlight both the value and dangers of device-specific fine-tuning and support balanced domain-adaptation strategies for deploying mammography-based risk models across diverse clinical imaging environments.

25.
arXiv (CS.AI) 2026-06-24

OmniPath: A Multi-Modal Agentic Framework for Auditing Wheelchair Accessibility

arXiv:2606.24129v1 Announce Type: new Abstract: For a wheelchair user, a standard blue line on a map is often a broken promise. While platforms like OpenStreetMap (OSM) successfully capture where a path is, they frequently fail to convey how it physically feels to travel on it. This information barrier is problematic for wheelchair users. To solve this issue, we present OmniPath, a system that moves from passive mapping to proactive environmental auditing. Our framework fuses the network topology of OSM with the submeter precision of high-density aerial LiDAR (USGS 3DEP) to create a high-fidelity 3D model of the pedestrian environment. Rather than simply routing a user, our agent virtually traverses the network, analyzing the surface in 0.5 meter increments. It rigorously quantifies physical friction points specifically running slope, cross slope, and vertical discontinuities against ADA compliance standards, calculating a weighted severity score to categorize hazards from ``Mild'' to ``Critical.'' To ensure real world reliability, we validated the system against 200 physical ground truth field surveys across the National Mall using stratified random sampling. The framework demonstrated strong diagnostic reliability for high-severity hazards, achieving F1-scores of 0.60 for Severe and 0.58 for critical categories. By automating this micro-scale inspection, OmniPath identifies the ``invisible'' barriers that standard maps miss, effectively transforming a static dataset into accessibility data source that anticipates accessibility challenges before the user ever leaves home.