Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

RQUL-UIE: Revitalizing Quality-Unstable Labels for Underwater Image Enhancement via In-Dataset Self-Supervision

Underwater Image Enhancement (UIE) is essential for mitigating degradations caused by water medium. Although learning-based methods have advanced significantly, most rely on paired datasets with unstable label quality, which bottlenecks model performance. This paper proposes a diffusion-based, in-dataset self-supervised learning strategy designed to exploit the quality distribution of training labels. Specifically, we evaluate label quality via semantic perception embeddings from a pre-trained diffusion model in a training-free manner. These quality scores are subsequently quantized into noise-level indices, guiding a multi-step denoising process for level-wise supervision. This mechanism prevents low-quality labels from degrading the model while maximizing their utility during training. Furthermore, a Fourier-based refinement network is incorporated to explicitly reconstruct high-frequency components. Extensive evaluations demonstrate that our method consistently outperforms SOTA approaches in restoration quality. The code and pre-trained model will be available once accepted in link.

02.
arXiv (CS.AI) 2026-06-15

AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges

arXiv:2606.14295v1 Announce Type: cross Abstract: Frontier AI systems are increasingly capable of cybersecurity tasks, including codebase inspection, vulnerability detection, and exploitation. However, evaluating their offensive capabilities remains constrained by limited access to open, reproducible, multi-host cyber ranges. Existing public benchmarks capture isolated skills such as CTF solving, vulnerability reproduction, and exploit generation, but often abstract away realistic intrusion workflows: discovering exposed services, gaining a foothold, collecting internal information, and expanding compromise across hosts. This gap makes it difficult to observe emerging risks early, because frontier AI systems are rarely evaluated under realistic attack conditions. We introduce AgentCyberRange, the first open, multi-range infrastructure for measuring autonomous cyber attack capability in realistic cyber ranges. It combines 110 vulnerabilities across 15 real web applications and 8 enterprise-like cyber ranges with 156 internal hosts, plus Cage, a toolchain for execution, orchestration, result collection, and verification. The benchmark covers two core stages: web exploitation, where agents explore exposed applications and validate vulnerabilities, and post exploitation, where agents turn an initial foothold into broader internal compromise. We evaluate six frontier AI systems under matched prompts and budgets. GPT-5.5 with Codex performs best, solving 16.1% of web exploitation tasks and 31.7% of post-exploitation tasks; with more concrete hints, these rates increase to 33.0% and 46.3%. We also observe out-of-benchmark findings, including unknown vulnerabilities in popular projects, and payload mutation that bypasses host defenses. These results show that open cyber-range evaluation is necessary for observing emerging offensive capabilities under realistic and reproducible conditions.

03.
arXiv (CS.AI) 2026-06-24

Zero-Shot Test-Time Canonicalization using Out-of-Distribution Scoring

arXiv:2606.24178v1 Announce Type: cross Abstract: Pretrained vision models often misclassify inputs that are rotated, scaled, or sheared, even though these affine transformations leave the object class unchanged. Robustness is usually restored either by building equivariance into the architecture or by retraining with augmentation, both of which require changing or retraining the model. Test-time canonicalization instead leaves the classifier untouched. It undoes the transformation of each input, mapping it to a canonical form near the training distribution before classification. Existing canonicalizers, however, rely on a narrow set of logit-based energy scores and bespoke search procedures, leaving the design space of scoring functions and optimizers unexplored. We reframe canonicalization as out-of-distribution (OOD) detection, which lets any OOD score serve as the energy minimized over transformations. Across benchmarks ranging from handwritten characters and sketches to natural images and 3D point clouds, we systematically evaluate around twenty OOD scores and nine search algorithms, finding that distance-based scores paired with random search and local refinement perform best overall. Because canonicalizing an already-aligned input can hurt accuracy, we add a gated mechanism that transforms an input only when its OOD score indicates this is needed, preserving most in-distribution accuracy while retaining the robustness gains on transformed inputs. Code is available at github.com/johschm/its.

04.
bioRxiv (Bioinfo) 2026-06-19

SteerAF: Distogram-based Steering of AlphaFold2 toward Alternative Conformations

End-to-end structure predictors, such as AlphaFold2, typically output only the dominant conformational state of a given protein, which is biased by the training data set. Existing strategies for recovering alternative conformations are often computationally expensive and offer limited biological interpretability. Here, we present SteerAF, an inference-time optimization framework based on AlphaFold2 that leverages information encoded in the distogram derived from deep multiple sequence alignments (MSAs) to predict alternative protein conformations. Across four benchmark datasets, SteerAF matches or surpasses existing methods in predicting alternative conformations for the majority of systems. Sparse MSA-feature modifications generated via block gradient ascent exhibit a strong correlation with experimentally characterized functional residues, recovering them with approximately 50% precision in the tested proteins. Furthermore, SteerAF enables effective decoy selection in the absence of experimental structures, and its predictions can serve as seed structures for molecular dynamics simulations to map conformational landscapes. Thus, SteerAF provides an efficient and interpretable approach for predicting alternative conformations, offering a framework that can be extended to other similar predictors and problems.

05.
arXiv (CS.AI) 2026-06-15

Can LLMs Accurately Score Medical Diagnoses and Clinical Reasoning?

arXiv:2604.14892v3 Announce Type: replace-cross Abstract: Evaluating medical AI systems using expert clinician panels is costly and slow, motivating the use of large language models (LLMs) as alternative adjudicators. Here, we evaluate an LLM Jury, composed of three frontier AI models, for scoring 3334 diagnoses on 300 real-world low- and middle-income country (LMIC) hospital cases. Both LLM- and clinician-generated diagnoses are scored against expert panel diagnoses across four dimensions: diagnosis, differential diagnosis, clinical reasoning, and negative treatment risk. The LLM Jury scores are compared with expert and independent re-scoring panel scores to assess error metrics, inter-rater agreement, severe-risk errors, and the effect of post hoc calibration using isotonic regression. In our data, we find that: (i) the uncalibrated LLM Jury scores preserve ordinal agreement with the expert clinician panel scores, but are systematically lower; (ii) the probability of severe-risk errors is lower for the LLM Jury than the human expert re-score panels; (iii) the LLM Jury combined with LLM diagnoses can be used to identify diagnoses at high risk of error, enabling targeted expert review and improved panel efficiency; (iv) the calibrated LLM Jury scores and rankings of diagnosing agents show excellent agreement with those of the primary expert panels; (v) LLM Jury models show no self-preference bias, they did not score diagnoses generated by their own underlying model or models from the same vendor more (or less) favourably than those generated by other models. Together, these results provide evidence that a calibrated LLM Jury is a trustworthy and reliable proxy for expert clinician evaluation in medical AI benchmarking. Confirming these findings in other clinical settings is an important direction for future work.

06.
arXiv (CS.LG) 2026-06-19

Evaluating deep learning models for fault diagnosis of a rotating machinery with epistemic and aleatoric uncertainty

arXiv:2412.18980v2 Announce Type: replace Abstract: Uncertainty-aware deep learning (DL) models recently gained attention in fault diagnosis as a way to promote the reliable detection of faults when out-of-distribution (OOD) data arise from unseen faults (epistemic uncertainty) or the presence of noise (aleatoric uncertainty). In this paper, we present the first comprehensive comparative study of state-of-the-art uncertainty-aware DL architectures for fault diagnosis in rotating machinery, where different scenarios affected by epistemic uncertainty and different types of aleatoric uncertainty are investigated. The selected architectures include sampling by dropout, Bayesian neural networks, and deep ensembles. Moreover, to distinguish between in-distribution and OOD data in the different scenarios two uncertainty thresholds, one of which is introduced in this paper, are alternatively applied. Our empirical findings offer guidance to practitioners and researchers who have to deploy real-world uncertainty-aware fault diagnosis systems. In particular, they reveal that, in the presence of epistemic uncertainty, all DL models are capable of effectively detecting, on average, a substantial portion of OOD data across all the scenarios. However, deep ensemble models show superior performance, independently of the uncertainty threshold used for discrimination. In the presence of aleatoric uncertainty, the noise level plays an important role. Specifically, low noise levels hinder the models' ability to effectively detect OOD data. Even in this case, however, deep ensemble models exhibit a milder degradation in performance, dominating the others. These achievements, combined with their shorter inference time, make deep ensemble architectures the preferred choice.

07.
arXiv (quant-ph) 2026-06-15

Spin mixing induced dynamics of spinor solitons in $F=1$ Bose Einstein condensates

arXiv:2606.14231v1 Announce Type: cross Abstract: We explore soliton interactions in a homogeneous spinor $F=1$ Bose Einstein Condensate (BEC) in the presence of a magnetic field, focusing on dark bright dark and bright dark bright configurations. We investigate how these interactions depend on the phase differences among bright solitons and their influence during the dynamics. Our findings align with prior non spinor results, i.e., repulsion among in phase bright solitons and attraction among out of phase pairs in self repulsive atomic BECs. The potential bright soliton attraction, added to the short range repulsion of dark dark soliton interactions, can lead to bound states. However, we find that these bound states break in the presence of spinor interactions due to the particle exchange dynamics between the hyperfine states of the components. Additonally, we develop an effective classical model to describe the soliton dynamics, using a Lagrangian approach. The accuracy of the model is tested by comparing it against numerical simulations. Our results suggest that the proposed model captures the essential features of soliton behavior in the presence of spin interactions, and provides congruent soliton trajectories and interspecies particle exchange dynamics in most of the cases.

08.
arXiv (quant-ph) 2026-06-11

Unifying Quantum Smoothing Theories with Extended Retrodiction

arXiv:2510.08447v2 Announce Type: replace Abstract: Estimating the state of an open quantum system monitored over time requires incorporating information from past measurements (filtering) and, for improved accuracy, also from future measurements (smoothing). While classical smoothing is well understood within a Bayesian framework, its quantum generalization has been challenging, leading to distinct and seemingly incompatible approaches. In this work, we demonstrate that quantum state smoothing hinges on a uniquely quantum feature: the fundamental dependence of retrodiction on prior correlations. We introduce auxiliary systems into the prior belief to capture correlations formed during preparation and evolution and develop a comprehensive framework for quantum state smoothing based on extended Bayesian retrodiction. This framework identifies all previous approaches as different choices of the extended prior, and naturally extends it to other choices that have not been considered before. We also give an information-theoretic characterization of the choices of prior, in terms of the average entropy of the smoothed states. Our results establish quantum state smoothing as a fundamentally retrodictive process just like classical smoothing, with proper quantum features clearly identified.

09.
arXiv (CS.LG) 2026-06-15

Temporally Consistent Graph Q-Networks for Intelligent Network Control

arXiv:2606.13848v1 Announce Type: cross Abstract: Mobile networks continue to grow in complexity and next generation networks are expected to support both increasing traffic loads and more diverse services. As network complexity rises, optimizing antenna parameters under dynamic or changing objectives becomes increasingly challenging. We propose a novel multi-agent reinforcement learning (MARL) algorithm for high-level control and orchestration of mobile networks. The Temporally Consistent Graph Q-Network (TC-GQN) algorithm learns a self-predicting representation of the whole network that is task-independent and aggregates information from all base-stations. A graph neural network is trained using a global reward function to assign coordinated local actions based on the learned encoding of the global network state. We evaluate the algorithm in a simulated environment to orchestrate an energy-saving feature across multiple sectors and multiple carriers under different quality of service (QoS) constraints. The proposed algorithm outperforms state-of-the-art graph-based baselines and a competitive rule-based controller by improving hardware sleep time while maintaining QoS. Moreover, the learned representation enables rapid adaptation to changing intents.

10.
arXiv (CS.AI) 2026-06-24

HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

arXiv:2603.19957v2 Announce Type: replace-cross Abstract: Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing pathology vision-language models (VLMs) reduce this output to a flat label or free-form text. We present HiPath, a lightweight VLM framework built on frozen UNI2 and Qwen3 backbones that treats structured report prediction as its primary training objective. Three trainable modules totalling 15M parameters address complementary aspects of the problem: a Hierarchical Patch Aggregator (HiPA) for multi-image visual encoding, Hierarchical Contrastive Learning (HiCL) for cross-modal alignment via optimal transport, and Slot-based Masked Diagnosis Prediction (Slot-MDP) for structured diagnosis generation. Trained on 749K real-world Chinese pathology cases from three hospitals, HiPath achieves 68.9% strict and 74.7% clinically acceptable accuracy with a 97.3% safety rate, outperforming all baselines under the same frozen backbone. Cross-hospital evaluation confirms generalisation with only a 3.4pp drop in strict accuracy while maintaining 97.1% safety.

11.
arXiv (quant-ph) 2026-06-16

Minimum measurements quantum protocol for band structure calculation

arXiv:2511.04389v2 Announce Type: replace Abstract: Protocols for quantum measurement are an essential part of quantum computing. Measurements are no longer confined to the final step of computation but are increasingly embedded within quantum circuits as integral components of noise-resilient algorithms. However, each observable typically requires a distinct measurement basis, often demanding a different circuit configuration. As the number of such configurations typically grows with the number of qubits, measurements constitute a major bottleneck. Focusing on electronic structure calculations in crystalline systems, we propose a measurement protocol that restricts the required measurement configurations to an absolute minimum of just three, independent of the number of qubits. This makes it one of the few known protocols that do not scale with qubit number. In particular, we derive the measurement protocol from the symmetries of tight-binding (TB) Hamiltonians and implement it within the Orthogonal-Ansatz Variational Quantum Eigensolver (OA-VQE) algorithm. We demonstrate its performance on three systems, namely a two-dimensional CuO$_2$ square lattice (3 qubits), bilayer graphene with hexagonal (Honeycomb) lattice (4 qubits) and three-dimensional diamond lattice (10 qubits). Beyond tight-binding systems, the protocol can be extended to enable efficient initial state preparation for many-body Hamiltonians, such as multi-orbital Hubbard models in a momentum space.

12.
arXiv (CS.LG) 2026-06-16

Scalar-pathway fidelity improves physical accuracy in short-range equivariant interatomic potentials

arXiv:2606.15892v1 Announce Type: new Abstract: Accurate interatomic potentials enable molecular dynamics of materials, molecules, and interfaces beyond density-functional-theory length and time scales. Equivariant neural network potentials have improved the representation of local geometry. However, their deployable energy surfaces ultimately manifest through invariant scalar channels, whose aggregation and spectral resolution remain comparatively underexamined. Here we use Physics-Aware Neighborhood (PAN) pooling and Physics-Guided Spectral (PGS) mixers as controlled scalar-pathway probes: lightweight, symmetry-preserving modifications that act only on \(\ell=0\) channels while leaving the equivariant tensor backbone unchanged. Using MACE as a high-body-order mechanistic scaffold, PAN adds coordination-sensitive amplitude modulation, whereas PGS augments edge and readout scalar features with radial and tapered spectral bases. Across metallic Ag, covalent Si, a short-range ionic LiF/Li–F subset, and MD17/rMD17 molecules, this scalar-pathway correction reduces MACE force errors by 22–27\% and energy errors by 19–22\%; on systems with stress labels, stress errors decrease by 27–28\%, at approximately 5\% additional inference-FLOPs cost. Directionally consistent gains in Allegro and NequIP further indicate that the correction is portable across distinct short-range equivariant backbones, although effect sizes remain architecture-dependent. These results identify scalar-pathway fidelity as a practical design dimension for short-range equivariant interatomic potentials.

13.
arXiv (quant-ph) 2026-06-19

Near-Optimal Learning of Local Lindbladians

arXiv:2606.20535v1 Announce Type: new Abstract: We study the problem of learning local Lindbladians from black-box access to the physical evolution, and the goal is to estimate all Hamiltonian and dissipative coefficients. We give an algorithm built directly from finite-time channel probes, which runs the unknown evolution for short times, estimates the corresponding Pauli transfer matrices from classical shadows, and converts these estimates into Lindbladian coefficients by stable local Fourier inversions. For fixed locality and bounded dissipative site degree, the uses of the dynamical evolution and total evolution time scale as $\widetilde{O}(\Lambda^2/\varepsilon^2)$ and $\widetilde{O}(\Lambda/\varepsilon^2)$ respectively, in the local dynamical strength bound $\Lambda$ and target accuracy $\varepsilon$, with only logarithmic dependence on the number of qubits. The algorithm is non-adaptive, uses no ancillas, and uses only random product states as inputs followed by random Pauli measurements. The method does not require knowing the support of the Lindbladian in advance. We complement the algorithm with matching lower bounds, showing that the learning algorithm is near-optimal both in physical dynamics accesses and in total evolution time. We construct a single-qubit dephasing Lindbladian family that already requires $\Omega(\Lambda^2/\varepsilon^2)$ channel uses and $\Omega(\Lambda/\varepsilon^2)$ total evolution time, even for adaptive algorithms with arbitrary ancillas and measurements. In particular, the lower bounds imply that the Heisenberg-limited scaling achievable for Hamiltonian learning is information-theoretically impossible once dissipative coefficients must be estimated.

14.
arXiv (CS.CV) 2026-06-19

InfantFace: Detecting infant faces in neonatal clinical environments

Reliable localisation of the neonatal face is the first step for several video-camera based non-contact assessments such as pain and distress related facial expression analysis, pain scoring, cardiorespiratory signal extraction and cessation of breathing alerts. However, major challenges persist in neonatal clinical environments. Cluttered backgrounds, illumination changes and poor lighting conditions can reduce the accuracy of face detection models. Clinical interventions, monitoring equipment and, in some cases, medical devices can obstruct the face, making visual assessment difficult. We propose a one-stage YOLOv11m-based model tailored for face detection of infants in neonatal clinical environments. We combined multiple publicly available datasets (VGGFace2, CelebA, FDDB, WIDER FACE) to train and evaluate our proposed model. We then fine-tuned our model on a neonatal research dataset involving 228 videos from 114 recording sessions of 113 independent infants. Before fine-tuning, our model achieved an AP50 of 0.87, surpassing the performance of three state-of-the-art general face detectors. Performance improved further to an AP50 of 0.96 after clinical-domain adaptation. Evaluating face detection performance across different datasets remains a challenge due to the lack of publicly available neonatal datasets. Prioritising the creation of such datasets, while upholding appropriate privacy safeguards and ethical standards in their creation and use, would greatly support further progress in this field.

15.
arXiv (math.PR) 2026-06-18

The FBSDE approach to sine-Gordon up to $6\pi$

arXiv:2401.13648v3 Announce Type: replace-cross Abstract: We develop a stochastic analysis of the sine-Gordon Euclidean quantum field $(\cos (\beta \varphi))_2$ on the full space up to the second threshold, i.e. for $\beta^2 < 6 \pi$. The basis of our method is a forward-backward stochastic differential equation (FBSDE) for a decomposition $(X_t)_{t \geqslant 0}$ of the interacting Euclidean field $X_{\infty}$ along a scale parameter $t \geqslant 0$. This FBSDE describes the optimiser of the stochastic control representation of the Euclidean QFT introduced by Barashkov and one of the authors. We show that the FBSDE provides a description of the interacting field without cut-offs and that it can be used effectively to study the sine-Gordon measure to obtain results about large deviations, integrability, decay of correlations for local observables, singularity with respect to the free field, Osterwalder-Schrader axioms and other properties.

16.
arXiv (CS.CV) 2026-06-15

Planning with the Views via Scene Self-Exploration

Can VLMs predict how each camera move changes the view, and plan many such moves ahead? We call this capability view planning, requiring (1)understanding how a single action transforms the view, and (2)composing many such transformations across multi-turn plans to identify a target view. We probe both abilities in our proposed ViewSuite, a 3D point-cloud environment on real ScanNet scenes. Across 13 frontier VLMs, a critical planning gap emerges: they possess basic view-action knowledge but fail to compose it across multi-turn plans, with the gap widening as viewpoint distance grows. To close this gap, we propose an iterative framework that alternates self-exploration with view graph distillation. The key insight is that all exploration trajectories, regardless of their outcome, collectively form a view graph that compactly captures how viewpoints connect across a scene. Distilling this graph into diverse supervised tasks reshapes the policy distribution and overcomes the sparse rewards that stall pure RL. This improves Qwen2.5-VL-7B from 2.5% to 47.8% on interactive view planning, surpassing GPT-5.4 Pro (18.5%) and Gemini 3.1 Pro (21.4%). Self-exploration emerges as a promising path toward VLMs that can actively reason and plan in 3D space. Code and Data are at https://viewsuite.github.io.

17.
arXiv (math.PR) 2026-06-11

Martingale Solutions to a Stochastic Keller-Segel System with nonlocal Source and Super-linear Noise

arXiv:2606.11774v1 Announce Type: new Abstract: Global nonnegative martingale solutions are shown to exist for a stochastic Keller-Segel system with a nonlocal Fisher-KPP source and super-linear multiplicative noise. The result is obtained for nonnegative initial data with no smallness assumption, provided that the nonlocal source term is dominant. The main difficulty stems from the absence of a coercive structure and the super-linear nature of the noise. An additional cut-off with finite L^2 norm in the classical Galerkin method is added to establish a well-posed approximation problem. Moreover, due to the nonlocal Fisher-KPP structure, it is necessary to prove the positivity of the approximating solution in order to obtain uniform estimates. In the compactness arguments, the usual tightness argument in the framework of Hilbert spaces cannot be directly applied to the uniform estimates obtained in this paper. As a result, we develop a more general version of the compactness argument and tightness criterion, presented in the appendix, which will be applied throughout the paper. This allows for the global existence of nonnegative martingale solutions to be derived from Jakubowski's version of the Skorokhod Theorem, along with a thorough discussion of the convergence properties.

18.
arXiv (CS.AI) 2026-06-15

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

arXiv:2606.14418v1 Announce Type: new Abstract: We introduce COMET (Causal Object-centric Model for Efficient Tree search), a model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. COMET pairs a frozen unsupervised object-centric encoder with a transformer-based world model, in which actions are bound to objects through a novel action-slot fusion mechanism that is used in slot transition prediction. Policy and value heads use object-causal attention, modulating token interactions by learned per-slot relevance scores so that decision-making concentrates on task-relevant entities. COMET adds an explicit object-level inductive bias to MuZero-style latent planning. Across eight visually and dynamically diverse tasks from the Object-Centric Visual RL benchmark, ManiSkill, Robosuite, and VizDoom, COMET achieves a higher mean normalized score during the early stages of training compared to object-centric and monolithic baselines.

19.
arXiv (CS.CL) 2026-06-24

CALIBER: Calibrating Confidence Before and After Reasoning in Language Models

Reasoning language models are increasingly asked not only to answer difficult questions, but also to estimate their likelihood of success. Existing methods typically elicit confidence only once: either before thinking or after answering. We argue that confidence in reasoning models is state-dependent: before thinking, confidence should estimate the chance of the model correctly solving the prompt, while after thinking it should predict whether the realized answer is likely to be correct. This distinction determines the appropriate supervision target: prompt-level success should supervise confidence estimates made after seeing the prompt, while individual answer-level correctness should supervise confidence estimates made after answering. We introduce CALIBER (Calibration Before and After Reasoning), which elicits both estimates and supervises each with the target matched to its information state. Under this unified protocol, CALIBER reduces Expected Calibration Error (ECE) by 52.5% over the strongest single-confidence baseline on BigMathDigits for the 7B model, while achieving the best Brier score and AUROC, and remains within 2.1 points of the best accuracy. Further, on a larger 30B model, CALIBER achieves the best ECE on BigMathDigits while remaining competitive in Brier score and AUROC. Out of distribution, it achieves the best ECE and Brier score on GPQA and TriviaQA, and remains competitive on SimpleQA. Ablations further show that this position-target alignment is most beneficial under distribution shift where it consistently reduces calibration error across all out-of-distribution benchmarks.

20.
arXiv (CS.LG) 2026-06-12

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

arXiv:2606.13605v1 Announce Type: cross Abstract: This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

21.
arXiv (CS.AI) 2026-06-16

UrbanWell: Benchmarking Multimodal Large Language Models for Spatio-Temporal Urban Wellbeing Analytics

arXiv:2606.15890v1 Announce Type: new Abstract: Understanding urban wellbeing from multimodal data requires integrating heterogeneous spatial and temporal signals, posing significant challenges for current multimodal large language models (MLLMs). We introduce UrbanWell, a large-scale benchmark designed to systematically evaluate the spatio-temporal reasoning capabilities of MLLMs for urban wellbeing analytics through joint modeling of satellite and street view imagery. UrbanWell spans 38 cities across multiple years and includes diverse indicators covering (1) environmental conditions (CO$_2$, NO$_2$, PM${2.5}$, and Normalized Difference Vegetation Index), (2) spatial accessibility (minimum distance to supermarkets and restaurants), (3) urban form (road length, road density, and land use), (4) urban vitality (population, economic activity diversity, and land use diversity), and (5) subjective perception attributes (e.g., safety, beauty, liveliness, wealth, and quietness). All indicators are aligned at grid level to enable standardized evaluation. Beyond static prediction, UrbanWell defines temporal reasoning tasks, including future value forecasting from historical observations and temporal trend classification. We benchmark 15 state-of-the-art representative MLLMs in a zero-shot setting, providing a comprehensive comparative evaluation across spatial and temporal dimensions. Experimental results indicate that while MLLMs capture salient spatial and perceptual cues, their performance varies substantially across heterogeneous urban indicators spanning environment and subjective perception. UrbanWell serves as a unified benchmark for evaluating multimodal spatial and temporal reasoning in urban wellbeing analytics, offering a standardized testbed for systematic assessment and future research on multimodal urban intelligence. Our codes and datasets are accessible via https://github.com/axin1301/UrbanWell-Benchmark.

22.
arXiv (CS.CV) 2026-06-24

P-MTP: Efficient Document Parsing via Multi-Token Prediction with Progressive Depth Scaling

Vision-Language Models (VLMs) have revolutionized document parsing by enabling end-to-end mapping from images to structured text, imposing a significant latency bottleneck, particularly for token-dense documents. While Multi-Token Prediction (MTP) has emerged as a promising approach for accelerating inference, its potential is constrained by optimization instability when scaling to deeper look-ahead depth. In this paper, we propose P-MTP, a framework that leverages Progressive Multi-Token Prediction with a lightweight MTP module to scale the look-ahead depth for high-throughput document parsing. Specifically, we introduce Progressive Curriculum Loss that adaptively re-weights different look-ahead depths using cumulative path reliability and retrospective target consistency. By effectively suppressing gradient noise in long-range predictions, P-MTP, facilitates an automated easy-to-hard optimization transition, enabling the model to master increasingly distant look-ahead depths. Furthermore, we propose Confidence-Gated Dynamic Drafting to maximize the effective look-ahead depth and acceptance rate by adaptively calibrating speculative length during inference, thereby minimizing computational waste and further pushing the boundaries of inference speedup. Experimental results across multiple benchmarks and architectures demonstrate that P-MTP, achieves up to a $5\times$ speedup with negligible loss in accuracy, providing the first successful validation of extensive look-ahead MTP in the document parsing domain.

23.
arXiv (CS.CV) 2026-06-16

HemExp: Clinically-Guided Latent Diffusion for Modeling Hematoma Expansion

Hematoma expansion (HE) after spontaneous intracerebral hemorrhage (ICH) is a major determinant of acute triage and treatment decisions in neurosurgical care. However, most existing methods provide either a binary expansion risk or a single follow-up volume, limiting uncertainty-aware decisions. We introduce HemExp, a clinically-guided latent diffusion model that generates patient-specific follow-up non-contrast CT images, along with segmentations of intraparenchymal and intraventricular hemorrhage. Generation is conditioned on baseline imaging, clinical variables, and an explicit expansion indicator, enabling controllable simulation of realistic clinical scenarios. HemExp uses a hemorrhage-aware multi-head variational autoencoder and models progression as the difference between baseline and follow-up latent representations with a conditional diffusion model. The model is trained on paired scans from 450 patients across multiple centers and evaluated on 107 patients from a held-out institution. HemExp produces spatial HE probability maps by generating multiple synthetic follow-up images per patient to estimate distributions of plausible follow-up hematoma volumes. Perturbing clinical inputs such as symptom-onset-to-imaging time or anticoagulant status shifts the predicted follow-up volume distribution. HemExp extends binary predictors and demonstrates robust estimation of clinically relevant outcomes in the imaging space, such as hematoma volume, intraventricular involvement, and mass effects. Overall, our results support controllable latent diffusion as a promising direction for uncertainty-aware modeling of early ICH progression.

24.
bioRxiv (Bioinfo) 2026-06-15

SMS: Symmetric Mediation Statistics for Powerful High-Dimensional Mediation Analysis

Background: Mediation analysis of high-dimensional features, particularly molecular-level omics features, provides important opportunities to uncover biological mechanisms underlying human health and disease. However, two central statistical challenges remain: testing the composite-null hypothesis and maintaining power when the exposure-mediator and mediator-outcome associations differ substantially in statistical significance. Existing methods typically rely on accurate estimation of the proportions of the three null types or on the maximum of the two association p-values, and may not always control the FDR well and may have limited power under imbalanced significance. Methods: We propose SMS, a new statistical framework based on symmetric mediation statistics. By exploiting symmetry, SMS calibrates the composite null distribution as a whole for FDR control. It also allows flexible combinations of the two association p-values, including the maximum, and then enables construction of an omnibus test. Moreover, it permits direct use of effect-size estimates, bypassing the need to compute p-values. Results: SMS controlled the FDR across a wide range of simulation scenarios while achieving a substantial sensitivity gain, often around 20 percentage points, over existing methods including HDMT, DACT, and DEI-B. Applications to a metabolomics dataset and a DNA methylation dataset further corroborated these findings. Notably, SMS discovered five plausible mediators in the metabolomics dataset that were missed by all existing methods considered.

25.
arXiv (CS.CL) 2026-06-11

Beyond Compaction: Structured Context Eviction for Long-Horizon Agents

We present Context Window Lifecycle (CWL), a context-management scheme that gives long-horizon LLM agents an effectively unbounded working horizon. As a session accumulates history, CWL keeps the context within budget through graduated, semantically-aware eviction: the agent annotates its trajectory as typed, dependency-linked episodes as work proceeds, and a deterministic, LLM-free policy evicts content in priority order within that structure when a token budget is exceeded. CWL preserves user turns and the exploratory context the agent is actively reasoning over, while aggressively shedding action episodes whose effects are already persisted in the environment, keeping active context near a stable ceiling that also avoids the performance degradation associated with very large prompts. Compared to summarization-based compaction, CWL avoids four well-known limitations: unpredictable lossiness, destruction of causal structure, blocking model cost, and compression-induced hallucination. Compared to recency truncation, CWL is semantically aware: it drops the oldest-and-most-recoverable content according to the dependency graph rather than oldest-in-time regardless of relevance. We describe the annotation protocol, the episode graph, the eviction policy, and the token-accounting loop, and evaluate CWL on long-horizon agentic benchmarks: a single agent session completing 89 sequential tasks across 80 million tokens with no measurable degradation in task accuracy relative to per-task isolated sessions