Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

Differentiable Packing of Irregular 3D Objects with Adaptive Container Estimation

Most existing approaches either fix the container in advance or optimize only a single container dimension through an outer search loop, leaving the remaining dimensions as a manual tuning problem. We present a differentiable packing framework that jointly optimizes all 6N object pose parameters and all three container side lengths inside a single gradient-based loop. The formulation combines six physics-inspired, differentiable loss terms computed directly on triangle meshes through axis-aligned bounding-box proxies. An adaptive squeezing mechanism periodically tightens the container whenever the overlap loss falls below a pair-count-scaled threshold, producing a large initial drop in container volume, followed by small refinements. All pairwise computations are written in tensor-broadcasting form, giving a 3.4 to 54 times speedup over a reference loop-based implementation. The pipeline is implemented in Python and PyTorch, with no physics engine, FFT library, or convex decomposition. On multiple object categories, the method produces containers that are 11 to 32 percent smaller than time-matched DBLF and simulated-annealing baselines at N =100, while running in under 4 minutes per instance on a single consumer GPU.

02.
arXiv (CS.AI) 2026-06-16

Learning Interface Breakup: A Geometry-Conditioned Latent Surrogate for Spray Formation

arXiv:2606.16587v1 Announce Type: cross Abstract: Designing spray nozzles requires predicting how geometry shapes transient two-phase breakup, but high-fidelity volume-of-fluid (VOF) simulations with adaptive mesh refinement (AMR) are too expensive for iterative design exploration. Standard surrogate models are also challenged by this setting because both the liquid–gas interface and the underlying adaptive discretization evolve across time and geometries. We introduce a geometry-conditioned latent surrogate trained on 797 two-phase nozzle simulations that addresses this by encoding the AMR cell-density field, rather than the full multi-channel flow state, as a compact proxy for where the solver concentrates resolution. From this representation, the model reconstructs transient density evolution and nozzle geometry, and a lightweight second stage recovers the remaining flow variables. On held-out simulations, the method accurately captures key interface dynamics while reducing inference time to 0.045 seconds per trajectory, corresponding to a speed-up of more than $6\times10^4$ relative to Basilisk CFD. These results suggest that AMR refinement structure can serve as a compact and learnable representation for geometry-conditioned surrogate modeling of transient two-phase flows.

03.
arXiv (quant-ph) 2026-06-19

Unveiling coherent dynamics in non-Markovian open quantum systems: exact expression and recursive perturbation expansion

arXiv:2506.04097v2 Announce Type: replace Abstract: We introduce a systematic framework to derive the effective Hamiltonian governing the coherent dynamics of non-Markovian open quantum systems. By applying the minimal dissipation principle, we uniquely isolate the coherent contribution to the time-local generator of the reduced dynamics. We derive a general expression for the effective Hamiltonian and develop a recursive perturbative expansion that expresses it in terms of system-bath interaction terms and bath correlation functions. This expansion provides a systematic tool for analyzing energy renormalization effects across different coupling regimes. Applying our framework to paradigmatic spin systems, we reveal how environmental correlations influence energy shifts and eigenbasis rotations, offering new insights into strong-coupling effects and non-Markovian quantum thermodynamics.

04.
arXiv (quant-ph) 2026-06-11

Diffusive Relaxation of Participation Entropy in U(1)-symmetric Dynamics

arXiv:2606.11561v1 Announce Type: new Abstract: Participation entropy (PE) quantifies the spread of a many-body wavefunction across configuration space. While PE relaxes rapidly in generic chaotic systems, we show that $\mathrm{U}(1)$ conservation laws slow it down by imprinting with the slow hydrodynamic modes. Using a cluster expansion around equilibrium, we show that, after local density inhomogeneities decay, the leading PE deficit is dominated by squared connected density correlations. The long time relaxation is therefore controlled by diffusive correlation spreading, giving $\Delta S(t)\sim t^{-1/2}$ in the hydrodynamic regime and crossing over to $\sim \exp[-O(t/L^2)]$ when $t\geq L^2$. We confirm this entropy correlation relation using exact computation and infinite system tensor network simulations in various quantum $\mathrm{U}(1)$ conserving circuits. Our results establish PE as a sensitive probe of hydrodynamic memory and suggest that slow relaxation is a generic consequence of conservation laws.

05.
arXiv (CS.AI) 2026-06-17

S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices

arXiv:2606.18096v1 Announce Type: cross Abstract: Structured State Space Models (SSMs), including the S4 and S4D architectures, have recently emerged as powerful alternatives to attention-based models for capturing long-range dependencies in sequential data. Despite their strong empirical performance, deploying these models in time- and resource-constrained settings remains challenging due to their computational and memory demands. In this paper, we propose a novel incremental, operator-level pruning approach for S4- and S4D-based models that significantly reduces inference cost while preserving predictive performance. To the best of our knowledge, this is the first work to systematically investigate structured operator pruning for SSMs. Our method progressively prunes model operators by interleaving structured masking with fine-tuning, while jointly monitoring accuracy and inference latency. We implement this approach within a unified training and evaluation framework that enables systematic exploration of efficiency-accuracy trade-offs. Experiments across multiple benchmark datasets show that pruning up to 70% of the model operators preserves the performance of the original models in most cases, while substantially reducing inference latency. These results demonstrate that structured operator pruning is an effective and previously unexplored strategy for improving the efficiency of SSMs and facilitate their deployment in practical, resource-constrained scenarios.

06.
arXiv (CS.LG) 2026-06-17

Beyond IGO-Flow: Toward Convergence Analysis of IGO in Continuous Spaces

arXiv:2606.17523v1 Announce Type: cross Abstract: Information-Geometric Optimization (IGO) provides a unified framework for black-box optimization by interpreting the adaptation of a search distribution as a natural gradient update. Despite its conceptual importance, the convergence theory of IGO remains limited: most existing results concern continuous-time idealizations such as the IGO flow, rather than discrete-time updates with non-infinitesimal learning rates. In this paper, we study discrete-time IGO in continuous spaces, formulated as natural gradient updates in the expectation-parameter coordinates of an exponential family. In particular, we analyze IGO over the multivariate Gaussian family on strongly convex quadratic objective functions. Our analysis covers a setting that simultaneously incorporates full covariance adaptation, a fixed positive learning rate, and quantile-based weights. In this setting, we prove that the covariance matrix converges to the zero matrix. We further show that the mean vector converges to the global optimum, provided that the condition number of the appropriately scaled covariance matrix is bounded at sufficiently frequent iterations. These results advance the convergence theory of IGO and help bridge the gap between the mathematical theory of IGO and practical covariance-adaptive search methods such as CMA-ES.

07.
arXiv (CS.AI) 2026-06-15

Learning Coordinated Preference for Multi-Objective Multi-Agent Reinforcement Learning

arXiv:2606.14693v1 Announce Type: cross Abstract: Cooperative multi-objective multi-agent reinforcement learning (MOMARL) models team decision making under multiple, potentially conflicting objectives. In this setting, conflicts arise not only across objectives but also across agents with different observations, roles, and contributions. We propose Preference Coordinated Multi-agent Policy Optimization (PCMA), which learns coordinated agent-specific preferences to enable complementary trade-offs among agents. Theoretically, we formulate cooperative MOMARL as a team-optimal game and show that, under suitable conditions, preference diversity can induce team improvement through a first-order improvement decomposition. Experiments on multiple cooperative MOMA environments and a practical traffic-control scenario show that PCMA improves both performance and trade-off coordination.

08.
arXiv (CS.CL) 2026-06-17

Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code Translation

While large language models (LLMs) have greatly advanced the functional correctness of automated code translation systems, the runtime efficiency of translated programs has received comparatively little attention. With the waning of Moore's law, runtime efficiency has become increasingly important for program quality, alongside functional correctness. Our preliminary study reveals that LLM-translated programs often run slower than human-written ones, and this issue cannot be remedied through prompt engineering alone. Therefore, our work proposes SwiftTrans, a code translation framework comprising two key stages: (1) Multi-Perspective Exploration, where MpTranslator leverages parallel in-context learning (ICL) to generate diverse translation candidates; and (2) Difference-Aware Selection, where DiffSelector identifies the optimal candidate by explicitly comparing differences between translations. We further introduce Hierarchical Guidance for MpTranslator and Ordinal Guidance for DiffSelector, enabling LLMs to better adapt to these two core components. To support the evaluation of runtime efficiency in translated programs, we extend existing benchmarks, CodeNet and F2SBench, and introduce a new benchmark, SwiftBench. Experimental results across all three benchmarks show that SwiftTrans achieves consistent improvements in both correctness and runtime efficiency.

09.
arXiv (CS.LG) 2026-06-18

Provable quantum speedups for computing persistence in topological data analysis

arXiv:2410.21258v2 Announce Type: replace-cross Abstract: Topological data analysis (TDA) aims to extract noise-robust features from a data set by examining the number and persistence of holes in its topology. We provide an efficient quantum algorithm for a computational problem closely related to a core task in TDA – determining whether a given hole persists across different length scales. Further, we prove the problem itself is $\mathsf{BQP}_1$-hard, implying that a classical solution is extremely unlikely; this stands in contrast to all previous quantum approaches to TDA, where the problems were also intractable for quantum computers, or where a rigorous proof of classical hardness still remains open. This result implies an {exponential} quantum speedup for this problem under standard complexity-theoretic assumptions. Our approach relies on encoding the persistence of a hole in a variant of the guided sparse Hamiltonian problem, where the guiding state is constructed from a harmonic representative of the hole.

10.
arXiv (CS.AI) 2026-06-12

Benchmarking Counterfactual Prediction in Epidemic Time Series with Time-Varying Interventions

arXiv:2606.05692v2 Announce Type: replace-cross Abstract: Deep learning has enabled significant advances in time-series causal inference, yet progress remains constrained by the lack of realistic benchmarks with observable counterfactual outcomes. Existing datasets either rely on real-world observations without ground-truth counterfactuals or on simplified simulations that fail to capture complex causal dynamics. To address this gap, we develop a large-scale benchmark for counterfactual prediction in epidemic time series under dynamic interventions. Unlike existing benchmarks, it supports static and time-varying treatments, as well as both single-policy and multi-policy intervention settings, enabling evaluation of causal inference methods across a broad range of causal inference scenarios. Leveraging a calibrated agent-based model grounded in real-world demographic, mobility, epidemiological, and policy data, we generate realistic counterfactual trajectories across more than 150 U.S. counties. Using this benchmark, we evaluate widely used and state-of-the-art causal inference methods, revealing substantial performance differences and highlighting the challenges of realistic time-series causal reasoning.

11.
arXiv (CS.AI) 2026-06-16

Deep Q-Learning on Hölder Spaces

作者:

arXiv:2606.16846v1 Announce Type: cross Abstract: We study the operator-theoretic core of Q-learning in continuous-time stochastic control with continuous states and actions. In value-based reinforcement learning, each Q-learning or DQN update is built from a Bellman optimality target; our analysis isolates this target in a diffusion setting and studies its regularity and approximation complexity. Under uniform ellipticity and Hölder-regular coefficients, we show that a Bellman update maps bounded inputs into an anisotropic regularity class, smoothing the state variable while leaving only Lipschitz dependence on the action variable. This yields a compact family of Bellman iterates and motivates a tensor-product DeepONet architecture adapted to the mixed regularity of the problem. We then derive explicit approximation and resource bounds, together with a stiffness–complexity trade-off as the time step $\delta \to 0$. The resulting theory makes a direct contribution to Q-learning theory at the level of Bellman target regularity and approximation in continuous stochastic control. At the same time, we do not claim a full convergence theorem for practical sampled Q-learning with exploration, replay, and stochastic gradient updates.

12.
medRxiv (Medicine) 2026-06-11

A continental-scale scenario modelling framework for evaluating infant RSV immunisation strategies across Europe

Background. The recent approval of long-acting monoclonal antibodies (la-mAbs) and a maternal vaccine (MV) in the EU enables universal RSV prevention in infants. Modelling studies are widely used to quantify the population-level impact of alternative immunisation strategies. However, existing assessments of new RSV immunisation products focus on national or sub-national settings. Methods. We developed an age-stratified, stochastic compartmental model of RSV transmission for 28 EU/EEA countries. It combines literature-based parameters on RSV natural history and product efficacy with country-specific demographic and contact patterns. After model calibration against age- and country-specific RSV hospitalisation rates, we designed scenarios for both la-mAbs and MV at four coverage levels, with and without catch-up immunisation for infants under six months at season onset. We then evaluated each scenario against a no-immunisation baseline. Results. At 95% coverage, the cross-country median reduction in RSV hospitalisations over one season in infants under 12 months is 29.9% for la-mAbs (country median range: 27.7-33.9%) and 22.4% for MV (20.0-25.6%), scaling linearly with coverage. Out of all averted hospitalisations, 78.3% (90% CI: [67.3, 92.7]%) are concentrated in infants aged 0-2 months for la-mAbs and 72.7% (90% CI: [61.4, 88.6]%) for MV. A catch-up campaign nearly doubles the overall reduction in RSV hospitalisations. Conclusions. Despite country-specific heterogeneities, impact of la-mAbs and MV is comparable across settings and herd-immunity effects are largely negligible. This supports harmonised European guidelines on coverage targets. Seasonal catch-up campaigns emerge as an effective lever to maximise the impact of immunisation programmes.

13.
arXiv (quant-ph) 2026-06-16

Achieving double-logarithmic precision dependence in optimization-based quantum unstructured search

arXiv:2603.26039v3 Announce Type: replace Abstract: Grover's algorithm is a fundamental quantum algorithm that achieves a quadratic speedup for unstructured search problems of size $N$. Recent studies have reformulated this task as a maximization problem on the unitary manifold and solved it via linearly convergent Riemannian gradient ascent (RGA) methods, resulting in a complexity of $O(\sqrt{N/M}\log (1/\varepsilon))$, where $M$ denotes the number of target items and $\varepsilon$ denotes the success probability error. In this work, we adopt the Riemannian modified Newton (RMN) method to solve the quantum search problem, under the assumption that the ratio $ M/N$ is known. We show that, in this setting, the Riemannian Newton direction is collinear with the Riemannian gradient in the sense that the Riemannian gradient is always an eigenvector of the corresponding Riemannian Hessian. This structure removes the overhead of Hessian inversion and allows the proposed RMN method to retain the local quadratic convergence in terms of the error $\varepsilon$. More precisely, we rigorously prove an overall complexity of $O(\sqrt{N/M}+\log\log(1/\varepsilon))$. Furthermore, our approach remains Grover-compatible, namely, it relies exclusively on the standard Grover diffusion and oracle operators to ensure algorithmic implementability, and its parameter update process can be efficiently precomputed on classical computers.

14.
arXiv (CS.CV) 2026-06-17

Theoretical Grounding of Out-Of-Distribution Detection With Reinforcement Learning Optimizer

Out-of-distribution (OOD) detection in dynamic open-world environments requires a model to continually adapt to evolving data distributions while generalizing to covariate-shifted inputs and rejecting semantic-shifted OOD examples. Most existing OOD detection methods optimize only the current-step objective and do not explicitly account for how post-deployment environment changes affect future OOD behavior. In this paper, we establish a theoretical grounding for dynamic OOD detection using a reinforcement learning (RL)-guided optimizer that explicitly favors updates that reduce the semantic OOD false positive rate over time. We develop a novel augmented optimizer that uses an RL-guided correction term on top of standard gradient descent (GD) and show its improvement over both future-domain generalization and semantic-OOD rejection. We analyze temporal error decomposition in terms of model-change and environment-change generalization errors and develop a new theoretical framework for comparing the generalization errors under both GD and RL-guided optimizers.

15.
arXiv (CS.CV) 2026-06-16

Structure-Semantic Co-optimized Latent Diffusion Model for Fast Visual Anagram Synthesis

Visual anagram is an intriguing form of art creation wherein a single image presents different conceptual interpretations under transformations such as flipping or rotation. Recent work has achieved visual anagram synthesis by leveraging pretrained text-to-image (T2I) diffusion models, yet still suffers from several key limitations including computational inefficiency, suboptimal aesthetic quality, and weak semantic fidelity and expressiveness. This work focuses on generating visual anagrams with substantially improved visual quality at minimal computational cost, thereby advancing intelligent creation of illusionary digital art. To increase image resolution while reducing time overhead, we adapt the cutting-edge parallel denoising algorithm from pixel-based T2I model to the adversarially distilled latent-based one, and accordingly propose a structure-semantic co-optimization (S2CO) framework to counteract the consequent visual degradation. As the core of our approach, S2CO framework comprises three key innovations: (\romannumeral1) null-text structure alignment optimization; (\romannumeral2) semantic enhancement optimization; (\romannumeral3) attention-guided noise fusion. Building upon these components, our method dubbed S2CO-Anagram is able to generate higher-resolution anagram images with noticeably superior visual harmony and semantic faithfulness than related SOTA approaches, all while achieving substantially faster inference speed. Code will be publicly available.

16.
medRxiv (Medicine) 2026-06-18

Chest X-Ray as a critical screening tool for Household Contacts of TB: Lessons from Three Years of Programmatic Data in India

Introduction: Household contacts (HHCs) of pulmonary TB patients remain at high risk for TB infection and disease progression, yet many remain asymptomatic and are missed by symptom-screening pathways. While India expanded its TB preventative guidelines to include all HHCs in 2021, chest X-ray (CXR) screening continues to be used selectively, representing a missed opportunity in early case detection. Methods: The analysis uses programmatic data from Project JEET 2.0 (Joint Effort for Elimination of Tuberculosis), implemented by the William J. Clinton Foundation in India, between October 2021 and March 2024. Eligible HHCs (>=5 years) were offered CXR screening as part of TB preventive therapy (TPT) evaluation. Descriptive and multivariable analyses examined predictors of CXR uptake and TB yield. A two-stage logistic regression model estimated potential TB yield under universal CXR coverage. Model performance was evaluated using the area under the curve (AUC), and bootstrap simulations generated counterfactual estimates of missed TB cases. Results: Among 1,034,621 HHCs, 1.02% individuals were found positive for TB, which includes 7,786 HHCs who were on TB treatment already, while an additional 2,812 were identified during pre-TPT evaluation. Among eligible HHCs (n = 1,026,835), 70% were screened with CXR, of which 2.4% had suggestive TB findings. Of these, 79% went for further TB assessment. Symptomatic HHCs were more likely to be CXR screened (84% vs 69%) and assessed for TB, yet two-thirds of all detected TB cases were asymptomatic. It is estimated that universal CXR coverage and TB testing for suggestive cases can increase TB detection by at least 87%. Conclusion: The study provides a scalable approach to expand CXR coverage through public-private partnerships, enabling early TB detection among HHCs, especially among asymptomatic contacts. Future implementations will benefit from integrating AI-enabled reading, along with systematic follow up for those with suggestive findings.

17.
arXiv (CS.CL) 2026-06-16

Beyond Retrieval: Learning Compact User Representations for Scalable LLM Personalization

Personalizing large language models requires adapting model behavior to individual users while preserving robustness and deployment-scale efficiency. Existing approaches typically personalize LLMs either at the input level, by retrieving user histories or constructing profile prompts, or at the parameter level, by maintaining user-specific parameter-efficient modules. The former makes personalization sensitive to retrieval quality and prompt design, whereas the latter incurs storage and maintenance costs that grow with the user population. To address these limitations, we propose TAP-PER (Temporal Attentive Prefix for PERsonalization), a prefix-based framework that encodes user preferences as learnable representations, eliminating explicit prompt construction and replacing heavy per-user adapters with lightweight user-state prefix embeddings. Inspired by personalized recommendation systems, TAP-PER decomposes user modeling into user-state and query-conditioned components, and incorporates temporal signals to capture the evolving nature of user interests. Experiments on six LaMP tasks show that TAP-PER consistently outperforms prompt-based and model-based baselines across classification, rating, and generation settings. Moreover, TAP-PER uses 130x fewer per-user parameters than OPPU and roughly half the total parameter footprint of PER-PCS at the 1,000-user scale, demonstrating that scalable LLM personalization can be achieved without explicit prompt construction or heavy per-user adapters.

18.
arXiv (CS.CL) 2026-06-16

Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy

This paper develops Virtual Speech Therapist (VST), an intelligent agent-based platform that streamlines stuttering assessment and delivers customized therapy planning through automated and adaptive AI-driven workflows. VST integrates state-of-the-art deep learning-based stuttering classification, and multi-agent large language model (LLM) reasoning to support evidence-based clinical decision-making. The VST begins with the acquisition and feature extraction of patient speech samples, followed by robust classification of stuttering types. Building on these outputs, VST initiates an agentic reasoning process in which specialized LLM agents autonomously generate, critique, and iteratively refine individualized therapy plans. A dedicated critic agent evaluates all generated therapy plans to ensure clinical safety, methodological soundness, and alignment with peer-reviewed evidence and established professional guidelines. The resulting output is a comprehensive, patient-specific therapy draft intended for clinician review. Incorporating clinician feedback, the system then produces a finalized therapy plan suitable for patient delivery, thereby maintaining a clinician-in-the-loop paradigm. Experimental evaluation by expert speech therapists confirms that VST consistently generates high-quality, evidence-based therapy recommendations. These findings demonstrate the system's potential to augment clinical workflows, reduce clinician burden, and improve therapeutic outcomes for individuals with speech impairments. An interactive user interface for the proposed system is available online at: https://vocametrix.com/ai/stuttering-therapy-planning-agent , facilitating real-time stuttering assessment and personalized therapy planning.

19.
bioRxiv (Bioinfo) 2026-06-11

ANCHOR: haplotype-aware allelic and isoform inference from single-cell long-read RNA sequencing with de novo variant calling

Long-read RNA sequencing enables haplotype- and isoform-resolved allelic analysis of transcriptomes, yet extending this capability to single cells and distinct cell types remains computationally challenging due to sparse coverage, sequencing errors, incomplete variant information, and reference-biased transcript assignment. Here we present ANCHOR, a haplotype-aware framework for single-cell long-read RNA sequencing that performs de novo expressed-variant discovery, molecule-level haplotype assignment and isoform-resolved allelic quantification. ANCHOR combines a signed-graph variant caller, pair hidden Markov modelling and beta-binomial UMI aggregation to infer parental allele counts for genes and splice-resolved isoforms, without requiring a pre-existing phased genotype or deep learning. In human single-cell long-read RNA benchmarks, ANCHOR improved variant-calling performance over tested long-read RNA callers at single-cell and low-to-moderate coverage, and its beta-binomial model reduced depth-driven false positives in allele-specific expression testing. Applied to newly generated single-cell long-read RNA-seq data from reciprocal mouse crosses during gastrulation, ANCHOR resolved cell-type- and isoform-specific parent-of-origin imprinting and identified an antagonistic maternally biased Sgce isoform. ANCHOR provides a general framework for allele- and isoform-resolved analysis of diploid single-cell long-read transcriptomes.

20.
arXiv (quant-ph) 2026-06-12

Coarse-grained quantum thermodynamics: Observation-dependent quantities, observation-independent laws

arXiv:2507.15918v2 Announce Type: replace Abstract: In both classical and quantum thermodynamics, physical quantities are typically assigned objective values defined independently of our observations. We then refer to the 'work performed by a gas', or the 'entropy of the gas', regardless of how they are evaluated. Here, we question this conception in the context of quantum thermodynamics, estimating how the definition of pivotal thermodynamic quantities is affected by experimental instruments of limited precision. We find that the coarse-grained thermodynamic quantities frequently lead to different conclusions from those drawn in fine-grained scenarios. For instance, the irreversibility of a process, or its work payoff, can significantly vary with the instrument precision. We show nonetheless that coarse-grained thermodynamic quantities satisfy the same relations (i.e., the second law inequality, the relation between dissipation and distinguishability of a process from its time-reverse, and the quantum work fluctuation theorems) as their fine-grained counterparts. These results highlight the observation-independence of relations linking thermodynamic quantities which are themselves observation-dependent.

21.
arXiv (CS.CV) 2026-06-12

Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata

AI governance for medical imaging is formalizing: the 2026 ACR-SIIM Practice Parameter recommends local acceptance testing and ongoing drift monitoring, and the ACR Assess-AI registry monitors AI outputs using DICOM metadata for context. We argue that a necessary, currently unmonitored layer sits beneath output metrics: whether incoming studies remain within the acquisition envelope a model was validated on. Using a LUNA16-trained MONAI RetinaNet lung-nodule detector, we test whether acquisition state behaves as a structured, measurable variable. On real paired CT differing only in reconstruction kernel (NLST B30f vs B80f), kernel alone shifted AI-measured diameter and flipped a Fleischner size category in 5.2% (8 of 155) of nodules at fixed patient and acquisition, while detection confidence was unchanged (Wilcoxon p=0.22). Under controlled LIDC-IDRI perturbations the effects dissociated by axis: the noise axis degraded detection confidence (p=5.9e-32, concentrated in nodules under 6 mm) but not measurement, while the frequency/kernel axis corrupted measurement (p=8.6e-13) but not detection. A 4-feature pixel fingerprint recovered reconstruction identity (patient-level AUC about 0.95 on real CT, 0.995 on a QIBA phantom) where the ConvolutionKernel DICOM tag was uninformative (identical labels across reconstructions). The kernel axis transported across four manufacturers (leave-one-vendor-out AUC 0.94-0.98, matching the within-vendor ceiling). Acquisition state thus maps to distinct AI failure modes, frequency content to measurement reliability and noise to detection sensitivity, and is not recoverable from metadata. Acquisition-aware, input-side validation is the missing layer for the acceptance-testing and drift-monitoring requirements now entering imaging-AI accreditation.

22.
arXiv (quant-ph) 2026-06-19

Quantum Batteries as Work Sources for Phase-Locked Parametric Amplification

arXiv:2606.20306v1 Announce Type: new Abstract: Quantum batteries have been proposed as locally precharged work sources for superconducting quantum technologies, suggesting a route to reduce continuously supplied microwave drives. Here we ask whether the pump tone of a quantum-limited parametric amplifier can be replaced, or strongly duty-cycled, by a finite bosonic quantum battery. Quantizing the pump of a nondegenerate parametric amplifier exposes a resource distinction hidden in the classical description: stored pump energy can generate signal-idler photons, but pump phase coherence is required to generate a phase-locked amplifier field. In a closed trilinear model, coherent and phase-randomized coherent pumps with the same photon-number distribution produce comparable pair numbers, yet only the coherent pump produces anomalous two-mode coherence and an EPR-squeezed interference dip. Including leakage, we collect the emitted fields into cascaded temporal modes. At matched collector bandwidth, the coherent pump gives \(I_{\min}^{(f)}=0.553\), whereas the phase-randomized pump gives \(I_{\min}^{(f)}=1.94\) at nearly identical collected energy. Weak amplitude squeezing slightly improves the dip by reducing finite-pump number fluctuations while preserving the coherent displacement. Thus battery-powered parametric amplification requires phase-coherent stored energy, possibly assisted by number-noise reduction, rather than stored energy alone.

23.
arXiv (CS.AI) 2026-06-19

Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

arXiv:2606.20177v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sensing (RS) tasks. However, their ability to comprehend negation remains underexplored, limiting deployment in real-world applications where models must explicitly identify what is false or absent, e.g., emergency responders need to locate non-flooded routes for evacuation. To comprehensively study this limitation, we introduce RS-Neg, the first benchmark to evaluate negation understanding across region-level to scene-level tasks. Specifically, we design an automated data generation pipeline for RS imagery, using LLMs to synthesize diverse negation queries, and introduce a dynamic visual focus module for verification. Our evaluation reveals that advanced RS MLLMs struggle with negation, exhibiting hallucinations and substantial performance degradation. To close this gap, we propose NeFo, a novel test-time learning method that explicitly incorporates the logical role of negation into the model optimization. Remarkably, using about 5\% unlabeled test samples, NeFo significantly improves the negation understanding of models and shows strong generalization to unseen tasks. Code and data will be released upon acceptance.

24.
arXiv (CS.AI) 2026-06-18

MagpieTTS-LF: Inference-Time Long-Form Speech Generation Without Training on Long-Form data

arXiv:2606.18485v1 Announce Type: cross Abstract: Neural Text-to-Speech (TTS) systems achieve remarkable quality on short utterances but long-form speech generation shows prosodic drift, speaker inconsistencies and sentence boundary artifacts. Existing approaches either compress sequences, increase context length or naively concatenate independently synthesized chunks. We present an inference-time approach called MagpieTTS-LF that enables MagpieTTS to produce coherent long-form speech without model retraining. Our method introduces three key innovations: (1) soft attention priors to guide monotonic alignment while preserving past and future context; (2) a stateful inference algorithm that maintains context across sentence chunks, ensuring prosodic continuity; (3) history-aware text encoding that uses past text for discourse-level prosodic planning. Experiments on long texts show significant improvements in long-range intelligibility, prosodic coherence, speaker consistency, and boundary naturalness compared to other baselines.

25.
arXiv (CS.AI) 2026-06-17

DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models

arXiv:2604.24357v2 Announce Type: replace-cross Abstract: Diffusion language models generate without a fixed left-to-right order, leaving token ordering as a central algorithmic choice. Existing systems mainly use random masking or confidence-driven ordering, which respectively suffer from train–test mismatch and myopic exploration. We introduce DPRM (Doob -transform Process Reward Model), a plug-in token-ordering module that keeps the host architecture, denoising objective and supervision unchanged, and modifies only the ordering policy. DPRM starts from confidence-driven ordering and gradually shifts to process-reward-guided ordering through online estimates. We characterize the exact DPRM policy as a reward-tilted Gibbs reveal law, prove convergence of its stagewise Soft-BoN approximation, show that the online bucketized controller tracks the exact DPRM score at empirical-Bernstein rates, and establish a sample-complexity advantage under tractable optimization assumptions. Across nine hosts covering language reasoning, test-time scaling, protein, single-cell, molecular, DNA, text-to-image generation, and VQA, DPRM order variants improve several language, DNA, and multimodal settings while also identifying boundary cases where confidence-only ordering or task-specific utilities are preferable. Code is available at: https://github.com/DakeBU/DPRM-DLLM