Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-12

X-MADAM-RAG: Diagnosing and Handling Chinese-English Evidence Conflict in Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) systems may receive evidence that is not merely noisy but mutually contradictory. This issue becomes particularly salient in multilingual settings, where retrieved Chinese and English evidence may support incompatible answer candidates. We study this problem through X-RAMDocs-ZHEN, a controlled Chinese-English benchmark derived from RAMDocs for diagnosing evidence conflict in RAG. The benchmark contains 300 examples across six balanced conditions, including monolingual support, bilingual agreement, reversed conflict directions, and conflict with optional noise. We further examine X-MADAM-RAG, an interpretable pipeline that decomposes evidence handling into per-document candidate extraction, visible-evidence repair, deterministic candidate grouping, and conflict-aware aggregation. On the original controlled benchmark with Qwen2.5-7B-Instruct, X-MADAM-RAG achieves 0.9667 strict accuracy and 0.9767 conflict-aware success, outperforming an evidence-normalized single-call baseline. However, a zero-call rule-only extractor reaches 1.0000 on the same benchmark, revealing strong template regularity. To probe this limitation, we construct a deterministic naturalized stress test that removes explicit answer templates while preserving candidate strings. On its 100-sample subset, rule-only extraction falls to 0.0000, but X-MADAM-RAG also drops to 0.3000 strict accuracy, below both naive and evidence-normalized baselines. A privileged oracle remains perfect, indicating that document-level extraction is the main bottleneck. These findings position X-RAMDocs-ZHEN and X-MADAM-RAG as diagnostic tools for controlled evidence conflict rather than as evidence of general hallucination detection or robustness to natural retrieval.

02.
arXiv (quant-ph) 2026-06-11

Strong-field control of the $Z$-boson resonance in $e^+e^-$ collisions

arXiv:2606.09394v2 Announce Type: replace-cross Abstract: Resonant $Z$-boson production is a cornerstone of precision electroweak physics, with its vacuum line shape set by the $Z$ mass, width, and collision kinematics. We show that a strong laser field can significantly alter this picture. By treating the field nonperturbatively, we find that laser dressing of the incoming fermions alters the effective collision kinematics and opens laser-photon exchange channels, including multiphoton processes, in $e^{+}e^{-}$ collisions. As a result, the $Z$-resonance profile develops distinct intensity-dependent regimes, evolving from the vacuum limit to saturation at intermediate field strengths and to an approximately quadratic enhancement at higher intensities. Additionally, the polarization composition of the produced $Z$ bosons is redistributed. In particular, at high intensities the laser-induced contribution can compensate the intrinsic chiral asymmetry of the electroweak interaction, leading to nearly parity-balanced $Z$-boson production. Our results identify that strong classical fields can dynamically control electroweak resonance phenomena, opening a bridge between strong-field QED and high-energy collider physics.

03.
arXiv (CS.LG) 2026-06-11

PAWS: Preference Learning with Advantage-Weighted Segments

arXiv:2606.11982v1 Announce Type: new Abstract: Preference-based reinforcement learning (PbRL) learns policies from human trajectory-level comparisons, avoiding explicit reward design and expert demonstrations. Existing methods typically train utility functions on trajectory or segment-level preferences while relying on per-step utility estimates during policy optimization. This training and inference mismatch induces a distribution shift that severely degrades temporal credit assignment and limits policy learning. We analyze this issue and propose PAWS, a segment-based preference learning method that performs policy updates directly using segment-level advantage functions. By aligning utility training with policy optimization, PAWS preserves trajectory-level preference information and avoids unreliable per-step learning signals. Experiments on simulated robotic manipulation and locomotion tasks demonstrate that PAWS consistently outperforms existing PbRL approaches, highlighting the importance of distribution-consistent preference learning.

04.
arXiv (CS.CV) 2026-06-16

GraphWorld: Long-Horizon Planning with World Models for End-to-End Autonomous Driving

End-to-end autonomous driving has made significant progress by unifying perception, prediction, and planning within a single learning framework, achieving strong performance in short-horizon decision making. However, most existing E2E-AD methods remain confined to short-horizon planning and lack the ability to model long-term temporal dependencies, which severely limits their generalization and security in complex and highly interactive driving scenarios. In this work, we propose GraphWorld, an E2E-AD framework that explicitly enhances long-horizon planning through latent world modeling. We introduce an Ego-Centric Interaction Graph, which adaptively models critical neighboring agents based on spatial proximity, and propagates relational context to planning queries via cross-node cross-attention. We present a World-State-Conditioned Planning that learns ego-centric latent world representations by modeling interactions between an ego vehicle and surrounding agents. This latent world state captures key interaction dynamics and safety-relevant semantics, and serves as a conditioning signal to guide long-horizon, safety-aware trajectory planning. Extensive experiments on Bench2Drive, NAVSIMv1/2, and nuScenes demonstrate that GraphWorld significantly reduces collision rates and improves long-horizon planning performance, validating its effectiveness in complex driving environments.

05.
arXiv (CS.CV) 2026-06-18

Cosmos 3: Omnimodal World Models for Physical AI

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI – effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

06.
arXiv (CS.AI) 2026-06-16

A Causal Model of Theory of Mind in Conflict for Artificial Intelligence

arXiv:2606.16944v1 Announce Type: new Abstract: Theory of mind (ToM), the capacity to ascribe mental states to others and use those ascriptions for prediction and inference, is widely assumed to be essential for effective human-machine integration. Existing AI-ToM models address how to mentalize, but leave the question of when largely unaddressed. The central question is: under what situational and agent-level conditions is ToM engagement causally warranted in conflict? This paper presents a structural causal model formalized as a directed acyclic graph (DAG), treating ToM as a mechanism activated by situational and agent-level conditions rather than as an always-on capacity. The model specifies four exogenous variables capturing situational and agent-level conditions, five endogenous mediators, and a mechanistic ToM node producing engagement states through three distinct causal pathways: a tractability pathway, a reasoning-depth pathway, and an enabling-cause pathway. The primary outcome is epistemic accuracy, which decouples social reasoning from behavioral policy and generalizes across social phenomena beyond conflict. The framework gives AI systems a principled, resource-rational decision procedure for mentalizing, with implications for efficiency, trust, and the development of robust artificial social intelligence. Simulation validation, empirical human-machine teaming studies, and ethical considerations arising from conflict-optimized mentalizing are discussed.

07.
arXiv (math.PR) 2026-06-18

Milstein-type Schemes for Hyperbolic SPDEs

arXiv:2512.19647v4 Announce Type: replace-cross Abstract: This article studies the temporal approximation of hyperbolic semilinear stochastic evolution equations with multiplicative Gaussian noise by Milstein-type schemes. We take the term hyperbolic to mean that the leading operator generates a contractive, not necessarily analytic $C_0$-semigroup. Optimal convergence rates are derived for the pathwise uniform strong error \[ E_h^\infty := \Big(\mathbb{E}\Big[\max_{1\le j \le M}\|U_{t_j}-u_j\|_X^p\Big]\Big)^{1/p} \] on a Hilbert space $X$ for $p\in [2,\infty)$. Here, $U$ is the mild solution and $u_j$ its Milstein approximation at time $t_j=jh$ with step size $h>0$ and final time $T=Mh>0$. For sufficiently regular nonlinearity and noise, we establish strong convergence of order one, with the error satisfying $E_h^\infty\lesssim h\sqrt{\log(T/h)}$ for rational Milstein schemes and $E_h^\infty \lesssim h$ for exponential Milstein schemes. This extends previous results from parabolic to hyperbolic SPDEs and from exponential to rational Milstein schemes. Moreover, root-mean-square error estimates are strengthened to pathwise uniform estimates. Numerical experiments validate the convergence rates for the stochastic Schrödinger equation. Further applications to Maxwell's and transport equations are included.

08.
arXiv (quant-ph) 2026-06-11

Energy-Modulated Time-Asymmetric Spontaneous Collapse: Forward-Backward Dynamics from Stochastic Ito Reversal and Bright Solitons

arXiv:2606.06452v3 Announce Type: replace Abstract: We present a rigorous theoretical framework for symmetry breaking and quantum irreversibility arising from stochastic Ito field reversal within a cubic-quintic nonlinear Schrodinger equation (CQ-NLSE) formalism. Starting from three physically motivated considerations, forward and backward nonlinear stochastic differential equations are derived via the Ito calculus. Kinematic time-reversal is shown to be fundamentally incompatible with the Ito stochastic structure, yielding the universal asymmetry-coupling parameter of 2/3. An energy-driven collapse operator proportional to the product of noise strength, local probability density, and excitation energy squared is introduced, amplifying the collapse in high-density, high-excitation regions. Exactly bright soliton solutions are obtained for a quasi-one-dimensional BEC of attractive Li-7 atoms, with forward and backward amplitude ratio of 1.870. Heat map analysis of the parameter planes reveals that the forward collapse operator grows monotonically in time while the backward counterpart decays, achieving a ratio approximately 1030, sharply distinguishing this framework from conventional symmetric collapse models.

09.
arXiv (quant-ph) 2026-06-19

Faking entanglement with imperceptible measurement deviations

arXiv:2606.20396v1 Announce Type: new Abstract: Quantum entanglement is a central resource underpinning emerging quantum technologies, enabling capabilities beyond those of classical systems. Accurate verification of entanglement is therefore crucial. However, experimental schemes usually rely on the assumption that quantum measurements can be realized exactly. As the complexity of a quantum system grows, this assumption typically becomes increasingly unrealistic, therefore leading to a widening mismatch between theoretical models and experimental implementations. Here we demonstrate that arbitrarily small measurement errors, when adversarially encoded in the measurement apparatus, can lead to the false certification of high-dimensional entanglement in systems that are, in fact, separable. This is achieved by introducing explicit hacking attacks to measurement devices in well-established entanglement verification tests. We further experimentally demonstrate this effect using classical photonic states encoded in the spatial degree of freedom, spanning up to 61 dimensions with measurement fidelity errors as low as 0.23%. Our results uncover a fundamental vulnerability in current methods for high-dimensional entanglement detection, highlighting the susceptibility of complex quantum devices to small adversarial perturbations. The findings underscore the need for developing secure verification of quantum information that is robust to bounded discrepancies between theory and experiment.

10.
arXiv (CS.AI) 2026-06-16

PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression

arXiv:2606.15157v1 Announce Type: cross Abstract: KV cache compression is essential for reducing the memory cost of long-context large language model inference. Existing approaches, however, typically apply a single compression policy and a uniform cache budget across all transformer layers. This uniform design ignores the fact that different layers can play different roles during prefill and decoding, and may therefore require different eviction strategies and cache capacities. We present PolyKV, a layer-wise KV cache optimization framework that considers design space with method selection and budget allocation. PolyKV routes each layer to a suitable KV compression policy based on layer-level signals, while assigning non-uniform budgets under a fixed total budget. This formulation enables heterogeneous compositions of existing KV cache methods. Experiments on LLaMA-3.1-8B and Qwen3-8B show that, under the same 512-token average KV budget, PolyKV recovers 54.5% and 25.7% of the LongBench performance gap between the strongest single-policy baseline and FullKV, respectively. Across 128-1024 budget sweep, PolyKV consistently improves over the strongest baseline by 1.7%-6.4%, corresponding to 40.0%-54.5% recovery of the FullKV gap.

11.
arXiv (CS.CL) 2026-06-11

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, many studies have observed that MTP acceptance rates degrade significantly during RL training, leading to limited speedup performance. To address this bottleneck, we present Bebop, a systematic study of MTP in LLM post-training, and offer practical recipes to integrate MTP into large-scale RL pipelines. First, we reveal that the MTP acceptance rate is fundamentally bounded by the fluctuation of model entropy, which demonstrates a clear negative linear relationship with the rise of entropy in the RL stage. Second, we show that probabilistic rejection sampling largely alleviates the disturbance introduced by entropy in RL compared to greedy draft sampling. We further identify that the conventional MTP training objectives (cross-entropy or KL) are suboptimal in such settings, and therefore we propose a novel end-to-end TV loss that directly optimizes multi-step rejection sampling acceptance rate, yielding ~10% acceptance rate improvements, achieving up to 95% acceptance rates and up to 25% extra inference throughput gains across mathematical reasoning, code generation, and agentic tasks. Third, we test various online MTP training strategies during RL and show that pre-RL MTP training with e2e TV loss and rejection sampling achieves a consistent acceptance rate and speedup throughout the entire RL, eliminating the need for costly online MTP updating. We provide extensive experiments and analysis that validate our findings. Experimental results show our method achieves up to 1.8x end-to-end acceleration in async RL training of Qwen3.5, Qwen3.6, and Qwen3.7 models.

12.
arXiv (CS.CV) 2026-06-16

Mind the Gap: Diagnosing Constraint Discovery Failures in Text-in-Image Editing

作者:

A key challenge in multimodal reasoning is determining which visual dependencies become relevant under a specific task, rather than merely recognizing visible content. We study this through edit-induced constraint discovery in text-in-image editing, a controlled diagnostic setting where a local text change can activate secondary consistency constraints: given a valid editing instruction and an image, can a model identify the secondary regions that must also change? Across 461 diagnostic cases, four MLLMs, and 19 constraint subtypes, models recover only 46% case-level macro recall under unguided prompting versus 94% when constraints are explicitly provided, suggesting that a substantial portion of the failure arises when models must decide which unstated dependencies to surface. Oracle-field decomposition shows that case-specific causal explanations are the most effective partial guidance (0.782 recall), above region names (0.610) or type labels (0.646), suggesting that edit-specific causal cues account for much of the oracle gain. A downstream experiment further shows that higher self-discovery recall does not necessarily improve task performance: unverified self-discovery introduces false positives that offset recall gains, motivating precision-aware constraint elicitation.

13.
arXiv (CS.CV) 2026-06-16

LUCID: Learned Undersampling-Adaptive Consistency-Guided Inference with Deterministic Flow Matching for Sparse-View CT Reconstruction

Sparse-view CT reduces radiation dose and scanning time by acquiring fewer projection views, but angular undersampling makes reconstruction severely ill-posed, causing streak artifacts, structural blurring, and loss of fine details. Existing supervised methods are often tied to specific sampling settings, whereas generative methods may introduce anatomically inconsistent hallucination-like structures under severe undersampling. We propose Lucid, a sparsity-adaptive, consistency-guided reconstruction framework based on a Flow Matching generative prior for sparse-view CT. Lucid is trained only on high-quality CT images to learn a continuous transport between a Gaussian distribution and the high-quality CT image distribution, independent of view sampling. During inference, the sampling sparsity level is explicitly incorporated to adapt the generative trajectory of a single pretrained model. Specifically, Lucid constructs a degradation-matched initial state by sparsity-weighted fusion of the sparse-view FBP image and Gaussian noise, performs sparsity-modulated Flow Matching updates, and applies projection-domain data-consistency correction after each prior update. Experiments under multiple sparse-view settings show that Lucid achieves stable reconstruction performance across different sampling densities, improves image quality and structural fidelity, and reduces the risk of hallucination-like structures in generative sparse-view CT reconstruction.

14.
arXiv (quant-ph) 2026-06-11

Measurement-Free Toric-Code Memory in Array Globally Controlled Rydberg Array

arXiv:2606.12030v1 Announce Type: new Abstract: The central prerequisite of any fault-tolerant quantum architecture is a quantum memory: a block of encoded physical qubits whose logical state is actively preserved against noise across many rounds of error correction. In neutral-atom Rydberg arrays, realizing such a memory is obstructed not by the entangling gates themselves, which are already fast and high-fidelity, but by the auxiliary operations that a conventional error-correction cycle requires: mid-circuit fluorescence measurement, inter-zone atom transport, and locally focused single-qubit addressing. Each of these introduces latency, atom loss, or optical crosstalk that exceeds the cost of the underlying gates by orders of magnitude. These costs accumulate cycle after cycle, progressively degrading the very logical information the code is meant to protect. Here we propose a protocol that stabilizes a toric-code quantum memory without moving, measuring or local addressing atoms. The key is to use a three-species Rydberg atom array for the complete stabilizer cycle, including syndrome extraction, coherent correction, and ancilla reset, under global, species-selective laser pulses. Numerical simulation of a $4 \times 4$ rotated toric code shows a longer qubit lifetime when the physical error rate is below a pseudo-threshold $p^\star \approx 0.034$. The scheme offers a concrete, hardware-efficient route to topological quantum memory in neutral-atom platforms.

15.
arXiv (CS.AI) 2026-06-17

Geometry-Aware Post-Hoc Uncertainty Quantification in Operator Learning

arXiv:2606.17513v1 Announce Type: cross Abstract: Neural operators provide fast surrogates for PDEs but their deterministic predictions limit their use in tasks requiring uncertainty quantification (UQ), especially under geometric variability. Existing approaches primarily model uncertainty in network parameters, largely overlooking the geometry-aware representations learned by the operator itself. We propose REEF-GP (Residual on Embedded Features Gaussian Process), a post-hoc UQ framework that fits a GP to the residuals of a frozen neural operator whose internal embeddings define the kernel feature space. Rather than learning a separate feature map, REEF-GP adapts the operator's intrinsic coordinate-feature representations to construct geometry-aware uncertainties. To ensure stability and scalability on unstructured domains, REEF-GP incorporates spectral-normalized projections, heteroscedastic geometry-aware noise, and efficient subset-based training that avoids restrictive low-rank approximations. Across five PDE benchmarks with varying geometries, REEF-GP preserves predictive accuracy while achieving calibrated uncertainty estimates competitive with deep ensembles but at a fraction of their cost. Our approach remains robust under geometric distribution shift, with uncertainty concentrating in physically meaningful regions (e.g., shock fronts). Our results demonstrate that accurate and scalable post-hoc UQ for neural operators can be achieved directly in their learned feature space, offering a practical alternative to parameter-centric approaches.

16.
arXiv (CS.CL) 2026-06-19

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

Mean opinion score (MOS) prediction models are widely used as proxy metrics in text-to-speech (TTS) research, yet their ability to capture quality differences beyond acoustic fidelity remains unclear. We investigate this via controlled perturbations on speech: acoustic degradation, prosodic errors, and manipulation of speaker-specific characteristics such as pitch and speaking rate. We obtained MOS predictions for these speech samples from both human listeners and the model, and analyzed the differences in their perceptual characteristics. Results show that most models track acoustic degradation well, while all are insensitive to prosodic errors despite large subjective score drops. For speaker characteristics, models exhibit a double dissociation: strong mean fundamental frequency (F0) biases absent in human ratings, yet insensitivity to speaking rate and F0 variability that humans notice. These findings highlight limitations of scalar MOS prediction beyond acoustic fidelity.

17.
arXiv (CS.LG) 2026-06-15

Federated Learning for Feature Generalization with Convex Constraints

arXiv:2606.14416v1 Announce Type: new Abstract: Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation. To address these challenges, we propose FedCONST, an approach that adaptively modulates update magnitudes based on the parameter strength of the global model. This prevents over-emphasizing well-learned parameters while reinforcing underdeveloped ones. Specifically, FedCONST employs linear convex constraints to ensure training stability and preserve locally learned generalization capabilities during aggregation. A Gradient Signal to Noise Ratio (GSNR) analysis further validates the effectiveness of FedCONST in enhancing feature transferability and robustness. As a result, FedCONST effectively aligns local and global objectives, mitigating overfitting and promoting stronger generalization across diverse FL environments, achieving state-of-the-art performance.

18.
arXiv (quant-ph) 2026-06-15

Note on the local calculation of decoherence of quantum superposition in the static black holes

arXiv:2606.14178v1 Announce Type: cross Abstract: We investigate the decoherence of a quantum spatial superposition of a static particle in Schwarzschild and Reissner-Nordstr\"{o}m black holes. By treating the particle as a localized classical source coupled to a quantum scalar field, we reformulate the decoherence process in the Danielson-Satishchandran-Wald (DSW) gedankenexperiment through coherent state generation and derive the local expression for the decoherence functional in terms of the Wightman function. In the long-time limit, the decoherence rate is shown to be characterized by the low-frequency behavior of the Wightman function. We then employ the asymptotic matching method to calculate the analytical expressions of the Wightman functions in the Boulware, Unruh, and Hartle-Hawking vacua. We show that the decoherence behavior depends on the quantum state of the environmental field. While the Boulware vacuum gives vanishing decoherence for a static superposition, the thermal effects associated with Hawking radiation in the Unruh and Hartle-Hawking vacua can induce nonvanishing decoherence.

19.
arXiv (CS.AI) 2026-06-16

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents

arXiv:2606.08151v2 Announce Type: replace Abstract: Modern large language model (LLM) agents do not simply need longer contexts; they need decision-relevant evidence at the moment of action. We study decision-aware context selection: ranking retrieved files, tests, traces, rules, and memories by their expected effect on an agent's next action rather than by semantic similarity alone. We present the Counterfactual-Inspired Context Layer (CICL), which builds an instance context graph, estimates decision-oriented utility for candidate units, and compresses selected evidence into typed memory cards. The same schema can be instantiated with hosted LLM judges, local surrogates, or lightweight rankers, making the selection protocol auditable across model choices. On 50 SWE-bench Verified file-retrieval instances, Qwen3.6-Plus reranking of BM25 top-50 candidates improves hit@1 from 0.58 to 0.78 and MRR@10 from 0.634 to 0.790, with all 2,500 judgments parseable. Controlled diagnostics show that CICL identifies action-critical evidence: removing the top-utility semantic unit reduces F1 from 0.245 to 0.000. In selected-then-compressed mode, memory cards save 44.93 tokens per query while preserving selected evidence. CICL provides a practical layer for measuring, ranking, and compressing decision-critical context for tool-using agents. Code is available at https://github.com/stephen-guan-researcher/CICL.

20.
medRxiv (Medicine) 2026-06-22

Brain-gut axis imaging, motion correction with 11C-carfentanil total-body PET

Background: Mu-opioid receptors (MORs) are expressed throughout the body including in the brain and gastrointestinal (GI) tract. Total-body PET imaging of the brain and GI tract offers a promising approach for cross-sectional in vivo evaluation of the MOR brain-GI axis. However, intestinal motility and bladder filling introduce motion throughout the GI tract over the scan window. Here we establish analysis methodology to account for motion for dynamic imaging of the brain-GI axis, to further characterize peripheral MORs throughout the body and provide a framework for semi-automatic total-body PET modeling. Methods: 4 subjects underwent 90-min dynamic [11C]-carfentanil (cfn) total-body PET acquisitions at baseline, after intravenous naloxone (central antagonist) administration, and after orally administered loperamide (peripheral agonist and P-glycoprotein substrate). Thalamic MOR availability was measured using the Logan reference tissue model. Using CT-based segmentation, the GI tract was subdivided into anatomical segments, in addition to other peripheral organs (e.g., liver, psoas muscle). Frame-by-frame semi-automatic motion correction was performed with three distinct reference frames (11-14 min post-injection, p.i., 35-40 min p.i., and 85-90 min p.i.). The performance of these three were compared to manual correction. Compartment modeling and Logan graphical analysis were performed to estimate relevant kinetic parameters (K1, VT, VTLogan). Results: Across the 4 subjects and regions, kinetic parameter estimates were highly correlated (r>0.7) for K1, VT and VT Logan when comparing semi-automatic (reference frame at 35-40 min p.i.) and manual correction. With semi-automatic motion correction, graphical-based estimation of VTLogan in the gastrointestinal tract was significantly decreased with loperamide relative to baseline (p

21.
arXiv (CS.AI) 2026-06-18

X+Slides: Benchmarking Audience-Conditioned Slide Generation

arXiv:2606.19256v1 Announce Type: new Abstract: Automatically generating slide decks from source documents is an important application of large language models (LLMs). Existing benchmarks primarily assess slide completeness and technical depth, while overlooking the target audience as a critical real-world factor. For instance, specialists demand rigorous proofs, whereas decision-makers prioritize actionable conclusions. To bridge this gap, we introduce X+Slides, a benchmark specifically designed for audience-conditioned slide generation. Built on a diverse corpus spanning 113 topics and seven presentation scenes, X+Slides employs a dynamic evaluation framework constructed from 8,133 deduplicated, source-grounded probes. By assigning audience-specific utility weights to the same source-grounded probes, X+Slides reports four complementary metrics: Audience Coverage measures how much audience-essential information is conveyed, Domain-wise Coverage shows which information types are covered, Efficiency measures delivered utility per unit of attention cost, and Correctness verifies whether slide claims are supported by the source. Experiments on DeepPresenter, SlideTailor, and NotebookLM show that current systems can recover a substantial but still incomplete part of audience-essential information: at $\tau_A=0.7$, DeepPresenter reaches a best Audience Coverage of 0.714, SlideTailor reaches 0.594, and the NotebookLM ablation reaches 0.853 while showing clear grounding differences. These results indicate that visual quality and broad topic coverage should not be treated as evidence support without source-grounded evaluation.

22.
Nature (Science) 2026-06-17

Optical fibre gripper for high-performance 3D micromanipulation

作者:

Optical tweezers offer precise, non-contact control, but operate in a limited force regime and impose strict requirements on the characteristics of the targets as well as the environmental conditions1–4. Millimetre-scale mechanical tweezers can offer higher gripping force but are not suitable for precise manipulations5–11. Integrating microgrippers directly at the optical fibres provides a new approach for precise micromanipulation. However, existing fibre-integrated tweezers still face challenges in achieving high-performance manipulation of micro-objects (for example, single cells) within narrow spaces, mainly due to simplified architectures, constrained designs and millimetre-scale footprints12–14. Here we report a three-dimensional (3D) optical fibre gripper (OFG), which is fabricated by two-step, two-photon polymerization. The OFG consists of rigid photoresist microclaws and soft thermoresponsive hydrogel muscle doped with silver nanoparticles, and its size is only 38 × 38 × 61 μm3. The OFG exhibits a force-to-mass ratio of about 340 μN mg−1, outperforming previously reported fibre-integrated tweezers by one to two orders of magnitude. The OFG can manipulate opaque particles, irregular micromechanical components and diverse single-cell types. We further demonstrated its potential in 3D microassembly of complex microdevices (bearings, shafts and gearboxes) and biomimetic sampling in the narrow environment (<300 μm). These results position the OFG as a compact fibre-tip manipulator for 3D micromanipulation, offering reversible and tunable gripping in an intermediate force regime between optical field trapping and millimetre-scale mechanical tweezers. A miniature three-dimensional optical fibre gripper enables powerful, precise micromanipulation of particles and single cells in confined spaces, bridging the gap between optical and mechanical tweezers.

23.
arXiv (CS.AI) 2026-06-17

Comprehensive pKa Data Augmentation from Limited Real Data through an Engineered Models-Quantum Framework

arXiv:2606.17077v1 Announce Type: cross Abstract: Proton dissociation constants (pKa) are critical for functional molecule discovery and molecular modeling. Building on iBonD, the largest experimental pKa database established, we and other researchers have developed several methods including machine-learning-based empirical prediction and high-accuracy energy calculations. Despite this foundation, the rapid augmentation of high-quality pKa data remains fundamentally constrained. As part of this work, we performed large-scale regression-based pKa prediction on unlabeled molecular datasets using a collection of extensively optimized machine-learning models. The results indicate that, since the feature distributions of unlabeled molecular datasets, the pKa data distribution approximates normality, with extreme scarcity of tail-region samples. Although such augmentation is highly valuable for improving overall data availability and predictive modeling, it remains insufficient for efficiently discovering molecules with broad-spectrum pKa properties. To address this, we explore the targeted generation of molecules with sparse pKa properties from the vast chemical space. Given that traditional continuous latent space VAE-RNN methods for molecular generation suffer from insufficient stability and fail to demonstrate clear advantages in complementing sparse data, we design and implement a quantum-assisted sparse-pKa molecular generation. Feasibility is validated on a simulated quantum annealer, and superior extreme-value sampling is further achieved on physical coherent Ising machines (CIMs). (to be continued)

24.
arXiv (CS.CV) 2026-06-18

Cross-Lingual Learning within Arabic Script for Low-Resource HTR

Handwritten Text Recognition (HTR) with limited labeled data remains a challenging problem, particularly for Arabic-script languages. Although modern sequence-based recognizers perform well in high-resource settings, their accuracy degrades sharply as training data becomes scarce. Arabic-script languages share a common writing system with substantial character overlap, motivating cross-lingual learning as a strategy to mitigate data scarcity. We conduct a controlled line-level study of cross-lingual joint training for Arabic-script HTR under low-resource regimes (number of samples K = 100, 500, 1000 labeled lines) on Arabic (KHATT), Urdu (NUST-UHWR) and Persian (PHTD). CRNN and Vision Transformer-based HTR-VT models are trained on the union of multiple related Arabic-script datasets to mitigate the data scarcity and are evaluated on individual target languages. Both architectures benefit from cross-language training under low-resource conditions. CRNN remains more effective under extremely limited target-language data, whereas the benefits of cross-language training for HTR-VT become less consistent as larger amounts of target-language data become available. On Persian (PHTD), joint training achieves a Character Error Rate (CER) of 9.99 , surpassing previously reported results despite not using the full available training data. On an additional Urdu dataset (UNHD), joint training reduces CER from 17.20 to 14.45.

25.
arXiv (math.PR) 2026-06-18

Ergodic Properties of Non-Linear Density-Dependent Perturbations of the Ornstein-Uhlenbeck Process

arXiv:2606.18877v1 Announce Type: new Abstract: The present paper considers McKean-Vlasov SDEs with density-dependent spatially unbounded drift, which may be viewed as a non-linear density-dependent perturbation of the Ornstein-Uhlenbeck process. We develop a comprehensive theoretical framework for this class of equations. First, we establish strong well-posedness and derive optimal Gaussian pointwise bounds for both the solution density and its gradient. Then we derive an explicit expression for the stationary density and show that it satisfies logarithmic Sobolev and Poincaré inequalities. Finally, we prove exponential convergence to equilibrium in the \(\chi^2\)-metric.