论文广场 - AcademicHub

01.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.16823

Physically Motivated Ansatz for Open Fermionic Systems on Quantum Computer

作者:

Yi Liu ↗Xiaopeng Li ↗Zhen Liu ↗Zhenyu Li ↗

arXiv:2606.16823v1 Announce Type: new Abstract: Determining non-equilibrium steady states (NESS) of open fermionic systems is a fundamental problem akin to finding ground states of closed systems. To address this, variational quantum algorithms can be used to solve the Lindblad master equation, much like the Schrödinger equation, yet ansatz design for NESS remains challenging. Existing approaches rely mostly on hardware-efficient ansätze (HEA), which suffer from the barren plateau problem. Here, we introduce a physically motivated ansatz named NE-UCC. Numerical simulations demonstrate that NE-UCC reliably converges to the steady state even in strongly correlated regimes far from equilibrium, reducing the infidelity by up to ten orders of magnitude compared to HEA. Furthermore, NE-UCC facilitates the exploration of excited eigenmodes with specific symmetries.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2603.04976

3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

作者:

Xiongkun Linghu ↗Jiangyong Huang ↗Baoxiong Jia ↗Siyuan Huang ↗

Reinforcement Learning with Verifiable Rewards ( RLVR ) has emerged as a transformative paradigm for enhancing the reasoning capabilities of Large Language Models ( LLMs), yet its potential in 3D scene understanding remains under-explored. Existing approaches largely rely on Supervised Fine-Tuning ( SFT), where the token-level cross-entropy loss acts as an indirect proxy for optimization, leading to a misalignment between training objectives and task performances. To bridge this gap, we present Reinforcement Fine-Tuning for Video-based 3D Scene Understanding (3D-RFT ), the first framework to extend RLVR to video-based 3D perception and reasoning. 3D-RFT shifts the paradigm by directly optimizing the model towards evaluation metrics. 3D-RFT first activates 3D-aware Multi-modal Large Language Models ( MLLM s) via SFT, followed by reinforcement fine-tuning using Group Relative Policy Optimization ( GRPO) with strictly verifiable reward functions. We design task-specific reward functions directly from metrics like 3D IoU and F1-Score to provide more effective signals to guide model training. Extensive experiments demonstrate that 3D-RFT-4B achieves state-of-the-art performance on various video-based 3D scene understanding tasks. Notably, 3D-RFT-4B significantly outperforms larger models (e.g., VG LLM-8B) on 3D video detection, 3D visual grounding, and spatial reasoning benchmarks. We further reveal good properties of 3D-RFT such as robust efficacy, and valuable insights into training strategies and data impact. We hope 3D-RFT can serve as a robust and promising paradigm for future development of 3D scene understanding.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2606.14697

ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning

作者:

Sicheng Yang ↗Hangjie Yuan ↗Wenjun Zhang ↗Jinwang Wang ↗Yichen Qian ↗Weihua Chen ↗Fan Wang ↗Lei Zhu ↗

Building trustworthy medical multimodal large language models (MLLMs) is critical for reliable clinical decision support. Existing medical hallucination benchmarks mainly focus on data collection, but often ignore where hallucinations originate within the reasoning process. We find that hallucination sources vary across samples: errors may arise from visual misrecognition, incorrect medical knowledge recall, or flawed reasoning integration. To enable source-level hallucination diagnosis, we introduce ClinHallu, a benchmark for stage-wise hallucination diagnosis in medical MLLM reasoning. ClinHallu contains 7,031 validated instances, where each instance is augmented with a structured reasoning trace decomposed into Visual Recognition, Knowledge Recall, and Reasoning Integration. We also use stage-replacement interventions to measure how correcting specific stages affects the final answer. Beyond evaluation, we show that trace-supervised fine-tuning reduces stage-wise hallucinations. ClinHallu provides a fine-grained hallucination testbed for diagnosing and mitigating reasoning failures in medical MLLMs. The benchmark is publicly available at https://github.com/alibaba-damo-academy/ClinHallu.

阅读与讨论 → 访问原文 →

04.

medRxiv (Medicine) 2026-06-17 DOI: HASH:027c819f7f2e9515d65b444cc67c8ff0

Performance of five risk stratification tools for paediatric pneumonia against WHO scores using data from the PediCAP trial in sub-Saharan Africa

作者:

Nalwanga ↗Clements ↗Musiime ↗Mulenga ↗Mujuru ↗H. A ↗Sidat ↗Buck ↗W. C ↗Madhi ↗Bielicki ↗J. A ↗…

Background Risk stratification tools for childhood pneumonia have been proposed to improve identification of children at highest risk of death, particularly in low-resource settings. However, their added value over the WHO Integrated Management of Childhood Illness (IMCI) criteria and danger signs remains uncertain. Methods We conducted a secondary analysis of a multi-country randomised controlled trial of children without HIV hospitalised with pneumonia in Mozambique, South Africa, Uganda, Zambia, and Zimbabwe. We evaluated the performance of five published risk scores alongside WHO IMCI severity classification and danger signs. Discrimination for (1) in-hospital mortality, (2) 28-day mortality, and (3) 28-day readmission or death was assessed using area under the receiver operating characteristic curve (AUC). Comparative performance and clinical utility were examined. Results Of the 1010 participants, 18 (1.8%) died in hospital, 22 (2.2%) died in hospital or in the 7 days post-discharge, and 63 (6.2%) died or were readmitted by day 28. Univariate case-fatality rates were highest for variables associated with malnutrition, convulsions, and hypoxaemia. All risk scores demonstrated moderate discrimination for in-hospital and in-hospital+7-day mortality (AUC range approximately 0.75-0.84), with no meaningful differences between models, and performed similarly to the WHO danger signs and IMCI severity classification. In contrast, all approaches performed poorly in predicting 28-day readmission or death (AUC approximately 0.54-0.58). No risk score consistently outperformed simple clinical criteria. Conclusions In this multi-country dataset, we found no evidence that published paediatric pneumonia risk scores meaningfully outperform WHO IMCI-based clinical assessment for predicting mortality. The relatively small number of mortality events limits precision, and modest differences cannot be excluded. These findings suggest that, in low-resource settings, strengthening implementation of existing WHO clinical criteria may be more effective than adopting more complex prediction tools.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2606.13769

$\mu_0$: A Scalable 3D Interaction-Trace World Model

作者:

Seungjae Lee ↗Yoonkyo Jung ↗Jusuk Lee ↗Jonghun Shin ↗Amir Hossein Shahidzadeh ↗Yao-Chih Lee ↗H. Jin Kim ↗Jia-Bin Huang ↗Furong Huang ↗

World models that capture how actions induce physical change enable scalable robot learning without reliance on embodiment-specific action labels. Pixel-space video models provide broad visual priors but expend model capacity on dense appearance reconstruction, while direct action models require embodiment-specific labels that hinder scalability. We present $\mu_0$, a scalable world model based on 3D traces. Rather than predicting dense pixels or directly modeling actions, $\mu_0$ forecasts smooth 3D trajectories for salient interaction points such as objects, tools, hands, and contact regions, yielding a compact, embodiment-agnostic motion interface. To enable training from diverse video sources, our TraceExtract system automatically extracts 3D supervision by selecting keypoints, constructing globally aligned traces, and associating motion segments with hierarchical language captions. This TraceExtract supervision pretrains $\mu_0$ by combining a pretrained vision-language backbone with a modular trace expert, which represents each query via B-spline control points and predicts future traces. Experiments show that $\mu_0$ outperforms baselines in both 2D and 3D trace prediction, including trace prediction models and tokenized VLM methods. Because $\mu_0$ is frozen and reusable, it can be paired with action experts for downstream robot embodiments. Despite action-free pretraining, the resulting trace-conditioned policies achieve performance competitive with VLA models pretrained with action supervision, such as $\pi_0$. These results establish 3D traces as a scalable and transferable representation for cross-embodiment manipulation.

阅读与讨论 → 访问原文 →

06.

Nature (Science) 2026-06-09 DOI: HASH:e965c6e96db8e206429b797bc244040f

Seven steps for critically analysing research papers

作者:

Jacques Cornwell ↗

By sticking to a clear strategy when reading, I get much more out of the literature, says Jacques Cornwell. By sticking to a clear strategy when reading, I get much more out of the literature, says Jacques Cornwell.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CL) 2026-06-24 DOI: arXiv:2606.23724

EvidenceLens: A Claim-Evidence Matrix for Auditing Financial Question Answering

作者:

Fengchen Gu ↗Xiaotian Ren ↗Zhengyong Jiang ↗Zhilu Zhang ↗\'Angel F. Garc\'ia-Fern\'andez ↗Angelos Stefanidis ↗Mian Zhou ↗Huakang Li ↗Jionglong Su ↗

Large language models are increasingly used to answer questions over annual reports, earnings decks, and analyst notes, yet their outputs remain difficult to verify in high-stakes financial workflows. A fluent answer can blend directly grounded statements, weak synthesis, and unsupported claims across narrative text, tables, and charts. We present EvidenceLens, a visual analytics prototype that treats financial question answering as a claim-evidence alignment problem. The system decomposes an answer into atomic claims, summarizes support composition and confidence, support gaps, and coordinates claim-level inspection with source passages, table cells, and chart regions. Its core visual representation is a multimodal claim-evidence matrix that makes coverage, contradiction, and modality imbalance immediately visible. To support reproducibility, we also specify a JSON-based artifact schema, a lightweight multimodal alignment pipeline, and a deterministic review-priority ranking that maps backend signals into an auditable visual structure. Through representative report-auditing scenarios, we show how EvidenceLens helps analysts distinguish grounded claims from overconfident synthesis that conventional chat interfaces flatten.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.10686

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

作者:

Spyros Rigas ↗Ioannis Contopoulos ↗Georgios Alexandridis ↗Antonios Nathanail ↗

arXiv:2606.10686v2 Announce Type: replace-cross Abstract: The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.AI) 2026-06-24 DOI: arXiv:2405.19062

Invariant Graph Representations for Continuous-Time Dynamic Graphs Under Distribution Shifts

作者:

Lanting Fang ↗Yulian Yang ↗Yawei Zhang ↗Shanshan Feng ↗Kaiyu Feng ↗Hanning Yuan ↗

arXiv:2405.19062v2 Announce Type: replace-cross Abstract: Continuous-Time Dynamic Graphs (CTDGs) enable fine-grained modeling of evolving relational systems. However, most existing CTDG representation learning methods are tailored to in-distribution settings and exhibit limited robustness under out-of-distribution (OOD) shifts. Although recent causal approaches learn invariant representations via interventions, they are primarily designed for static or discrete-time graphs and become computationally prohibitive for CTDGs due to the combinatorial explosion of structural and temporal variations. To address these challenges, we propose CIR, a framework grounded in a novel structural causal model termed the ICCM. To avoid exhaustive interventions, we leverage the Normalized Weighted Geometric Mean (NWGM) to efficiently approximate interventional predictions. We further instantiate ICCM within a practical deep learning architecture that jointly captures invariant structural and temporal patterns through dedicated subgraph extractors, and maintains an environment memory bank to model distributional shifts across evolving contexts. Extensive experiments demonstrate that CIR consistently outperforms existing methods under diverse OOD scenarios.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16843

Data-Driven Decoding of Russell's Circumplex Model of Affect

作者:

Amdjed Belaref ↗Samir Sadok ↗Zineb Noumir ↗Renaud Seguier ↗

Affective computing increasingly relies on deep learning to represent emotions, yet latent spaces often remain opaque, high-dimensional black boxes. This paper investigates whether Transformers' embeddings recover the geometric regularities of Russell's circumplex model. We unify two complementary experiments testing the hypothesis that, after training models on text and speech, their resulting latent spaces encode a topology consistent with valence-arousal and reproduce human-like neighborhood relations. Specifically, we evaluate deep representations extracted from Transformer-based text (RoBERTa) and speech (wav2vec 2.0) encoders, along with a multimodal Transformer fusion architecture, across naturalistic datasets like MSP-Podcast and controlled LLM-generated stimuli. Our analysis reveals that multimodal fusion of text and audio yields perfect topological alignment with Russell's primary emotion ordering. Furthermore, in a zero-shot setting using generic text embeddings, projected fine-grained emotion terms fall close to their established human-mapped coordinates. Our contribution is a novel, data-driven framework for validating emotion models, demonstrating that Russell's circumplex structure is intrinsically encoded in the embeddings of these modalities rather than being solely an artifact of human labeling, thereby bridging the gap between psychological theory and representation learning.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17199

PowerOPD: Stabilizing On-Policy Distillation with Bounded Power Transformation

作者:

Anhao Zhao ↗Junlong Tong ↗Yingqi Fan ↗Ping Nie ↗Wenjie Li ↗Xiaoyu Shen ↗

arXiv:2606.17199v1 Announce Type: cross Abstract: Standard on-policy distillation (OPD) for large language models estimates the reverse-KL objective using student-sampled tokens, yielding an unbiased single-sample Monte Carlo estimator that avoids vocabulary-wide computation. However, we show that this estimator suffers from severe training pathologies in practice: sample inefficiency, unstable generation dynamics, and a substantial performance gap compared to exact full-vocabulary OPD. Reward-level diagnosis traces these pathologies to the log-ratio reward, which is unbounded by construction, producing extremely high-variance gradients concentrated at early positions and persisting throughout training; standard post-hoc scaling fail as they operate only after this distortion occurs. To solve this problem, we propose PowerOPD: a family of natively bounded, sign-consistent rewards from the Box-Cox power transformation, parameterized by alpha > 0, of which the log-ratio is the degenerate alpha -> 0 limit. Across six mathematical reasoning benchmarks and four Qwen3 teacher-student pairs, PowerOPD achieves benchmark-averaged Avg@8/Pass@8 gains of up to +6.37/+5.71 over vanilla OPD, +3.01/+3.54 over post-hoc stabilization, and +2.59/+8.90 over full-vocabulary OPD, while reducing wall-clock time by 59.2% and peak GPU memory by 23.1%. Larger alpha generally improves accuracy, consistently shortens responses, and keeps gradient norms more than 3,000x smaller than vanilla OPD.

阅读与讨论 → 访问原文 →

12.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2606.11311

Exact Entanglement Dynamics Beyond Nearest-Neighbor Dual-Unitary Floquet Systems

作者:

Tanay Pathak ↗

arXiv:2606.11311v1 Announce Type: new Abstract: Exact results using dual-unitarity largely rely on nearest-neighbor structures, while finite-range interactions typically lead to complications. Going beyond the usual nearest-neighbor setting, we introduce an analytically tractable family of finite-range kicked Ising models that admit exact closed-form entanglement dynamics. The construction is based on a staggered structure in which dual-unitarity is present on sublattices that are then coupled to each other. The central observation is that these inter-sublattice couplings do not obstruct the dual-unitarity of the resulting model. For the minimal interaction range of $r= 2$, we derive exact expressions for all the $n-$Rényi entanglement entropies at all times and show that the result is the sum of the two coupled sublattice contributions. Our framework extends naturally to larger finite interaction ranges and to systems with heterogeneous local Hilbert spaces, without additional assumptions. It thus provides a controlled setting for studying exact entanglement growth beyond strictly nearest-neighbor dual-unitary models.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2606.12858

JSCGC: Joint Source-Channel-Generation Coding for Wireless Generative Communications

作者:

Tong Wu ↗Zhiyong Chen ↗Guo Lu ↗Li Song ↗Feng Yang ↗Meixia Tao ↗Wenjun Zhang ↗

Conventional communication systems, including both separation-based coding and learning-based joint source-channel coding (JSCC), are typically designed under Shannon's rate-distortion theory. However, relying on generic distortion metrics fails to capture complex human visual perception, often resulting in blurred or unrealistic reconstructions. In this paper, we propose Joint Source-Channel-Generation Coding (JSCGC), a generative communication paradigm that replaces the conventional decoder with a generative model at the receiver. The received signal is treated as a condition that controls the sampling process into the learned conditional distribution, reformulating communication from deterministic reconstruction for distortion minimization to controlled generation for mutual information maximization under perceptual constraints. Based on this formulation, we develop a unified joint training and efficient stochastic sampling framework, and provide theoretical analysis of its effectiveness in both learning and inference stages. Extensive experiments on latent-space image transmission demonstrate that the JSCGC consistently improves feature-based, semantic-level, and distributional quality across diverse channel conditions, while exhibiting a distinct error behavior characterized by semantic inconsistency rather than distortion.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17090

ANEForge: Python for direct computation on the Apple Neural Engine

作者:

Spencer H. Bryngelson ↗

arXiv:2606.17090v1 Announce Type: cross Abstract: ANEForge is a Python package that programs the Apple Neural Engine (ANE), the fixed-function neural accelerator on every recent Apple device, directly and without CoreML. In production the engine is reachable only through CoreML, which treats it as a scheduling option: no configuration requires the ANE, and a model can silently run on the CPU or GPU instead. ANEForge compiles a lazy tensor graph, built from 58 fused operators and 19 native bridge operators, into a single ANE program. The program is dispatched through the same ANE daemon and kernel-driver stack as Apple's internal framework. Beyond inference, the package reaches the engine's native fused attention, streams int8, int4, and sparse weights, keeps decoder and optimizer state resident across steps, and runs the forward pass, backward pass, and optimizer update of training on the engine. A small fused program completes a call in about 90us, near the engine's 70us per-program dispatch floor, and a pretrained ResNet-18 forward runs end-to-end in 0.33ms. ResNet-18, a sentence encoder, and a Vision Transformer run end-to-end against framework references, and a Stable Diffusion U-Net validates its forward pass. ANEForge targets Apple Silicon under macOS 14 and later. Each release is verified against a recorded macOS and ANE-compiler version.

阅读与讨论 → 访问原文 →

15.

arXiv (math.PR) 2026-06-12 DOI: arXiv:2606.13442

Scaling limit of additive functionals for reversible non-gradient exclusion process: critical cases

作者:

Chenlin Gu ↗Linzhi Yang ↗Linjie Zhao ↗

arXiv:2606.13442v1 Announce Type: new Abstract: For the reversible speed-change exclusion process $(\eta_t)_{t \geq 0}$ in $\mathbb{Z}^d$, we study the scaling limit of additive functionals ${\Gamma_t(f) = \int_0^t f(\eta_s)\, \mathrm{d} s}$. Concerning the local centered function $f$, the previous work [Commun. Math. Phys. 104, 1-19, 1986] by Kipnis and Varadhan and [Comm. Pure Appl. Math., 66: 649-677, 2013] by Gon{ç}alves and Jara respectively covered the cases $d \geq 3$ and $d=1$. The present paper completes the missing part $d=2$, and also develops the theory for functions with higher degree. The novelty is a quantitative homogenization of the resolvent, which allows to overcome the obstacle of correlation function in non-gradient models.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19735

GLARE: A Natural Language Interface for Querying Global Explanations

作者:

Bhavan Vasu ↗Rajesh Mangannavar ↗

arXiv:2606.19735v1 Announce Type: new Abstract: While global explanations are crucial for understanding vision models across datasets, classes, and decision contexts, their complex and monolithic nature often hinders practical exploration. Because users typically seek targeted answers to specific questions rather than static artifacts, we present an LLM-based interactive interface that provides natural language access to global explanations for black-box image classifiers. The system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. We evaluate the system on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors. Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.

阅读与讨论 → 访问原文 →

17.

bioRxiv (Bioinfo) 2026-06-10 DOI: HASH:1c6f1eabca5a961459741e0859541f7b

APOSM: Pairwise preference learning improves generative small-molecule design

作者:

Dreisler ↗M. W ↗Michael ↗Hatzakis ↗N. S ↗Boomsma ↗

Small-molecule lead refinement is constrained by the cost of synthesizing and assaying candidates, making the surrogate models that prioritize compounds for experimental testing central to the design process. The reliability of such surrogates is limited by the noise and sparsity of screening measurements. We show that training the surrogate on pairwise comparisons between candidate molecules, rather than on absolute predicted scores, yields a substantially more reliable signal for active candidate selection in this regime. We develop APOSM, an active-learning algorithm that combines a fragment-based generator, a pairwise message-passing graph neural network surrogate, and probabilistic ranking inside a batched acquisition loop. On the Practical Molecular Optimization benchmark and a GPCR ligand rediscovery task, APOSM improves target attainment and sampling efficiency over unguided fragment-based optimization, the Graph-GA genetic algorithm, and a pointwise-regression ablation, with the largest gains on tasks where absolute scores are hardest to calibrate.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.20198

Pitch Spelling Jazz Lead Sheets, Solo Transcriptions, Classical Piano and Monophonic Scores

作者:

Augustin Bouquillard ↗Florent Jacquemard ↗

We present an algorithm for pitch spelling and key estimation. Given an input in MIDI-like format, containing information on note pitches (expressed in semitones relative to the lowest reference note) and bar boundaries, it estimates the appropriate note names, a global Key Signature, and a local scale for each bar. This related information elements are evaluated jointly during two stages of optimisation. During an initial 'modal' stage, a probable scale is proposed for each bar, minimising the number of accidentals to be printed in the printed score with a shortest-path search. Then, during a second stage called 'tonal', these local scales are used to estimate the Key Signature and note names that would result in the best musical notation for the entire piece. We present evaluations conducted on datasets comprising a variety of digital musical scores: jazz lead sheets taken from the Real Book, transcriptions of recordings of jazz soli and bass lines, traditional tunes, as well as classical scores for piano and monophonic instruments. Our procedure was originally designed for use in music transcription, specifically for building digital collections of jazz solos transcribed from audio recordings, for the purposes of music analysis, teaching and the preservation of cultural heritage. This method should also prove useful for other tasks related to the processing of musical notation. Furthermore, to this end, we have defined new distances between various common jazz scales, which may be of some interest to musicological studies.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2606.20491

Fast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigation

作者:

Fatma Youssef Mohammed ↗Grzegorz Malczyk ↗Kostas Alexis ↗

Human visual attention relies on structured scanpaths to efficiently process scenes, yet instilling this behavior into robot autonomy is in its infancy and hindered by the high,computational costs of existing predictive models. To address this, we introduce GazeLNN, a computationally lightweight,scanpath prediction model that leverages Liquid Neural Networks as its recurrent engine and employs MobileNetV3 for feature extraction. Operating auto-regressively, the architecture predicts sequential fixation heatmaps conditioned on the current visual stimulus and fixation history. Despite requiring only 0.61 GFLOPs, GazeLNN achieves state-of-the-art performance on the MIT Low Resolution dataset achieving 0.47 ScanMatch score. It outperforms existing recurrent baselines across diverse evaluation metrics, while reducing computational costs by 99.40% and accelerating inference by up to six times. To investigate the role of human attention modeling in robot autonomy and demonstrate the practical utility of this highly efficient architecture, we integrate GazeLNN into an active camera-robot control policy trained via Reinforcement Learning. This integration enables human-fixation-guided perception during autonomous navigation, validated through successful real-world deployments on an aerial robot.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.CL) 2026-06-15 DOI: arXiv:2604.18419

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

作者:

Hen Davidov ↗Nachshon Cohen ↗Oren Kalinsky ↗Yaron Fairstein ↗Guy Kushilevitz ↗Ram Yazdi ↗Patrick Rebeschini ↗

LLMs utilizing chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2511.02414

A New Perspective on Precision and Recall for Generative Models

作者:

Benjamin Sykes ↗Lo\"ic Simon ↗Julien Rabin ↗Jalal Fadili ↗

arXiv:2511.02414v3 Announce Type: replace Abstract: With the recent success of generative models in image and text, the question of their evaluation has recently gained a lot of attention. While most methods from the state of the art rely on scalar metrics, the introduction of Precision and Recall (PR) for generative model has opened up a new avenue of research. The associated PR curve allows for a richer analysis, but their estimation poses several challenges. In this paper, we present a new framework for estimating entire PR curves based on a binary classification standpoint. We conduct a thorough statistical analysis of the proposed estimates. As a byproduct, we obtain a minimax upper bound on the PR estimation risk. We also show that our framework extends several landmark PR metrics of the literature which by design are restrained to the extreme values of the curve. Finally, we study the different behaviors of the curves obtained experimentally in various settings.

阅读与讨论 → 访问原文 →

22.

Nature (Science) 2026-06-10 DOI: HASH:4ab810cb4d084ce8b349b65e2c633eaa

Newly discovered whale graveyard dates back millions of years

作者:

Benjamin Thompson ↗

Submarine dives reveal complex deep-sea ecosystems living on whale remains — plus, a way to turn plant material into nylon. Hear the biggest stories from the world of science | 10 June 2026

阅读与讨论 → 访问原文 →

23.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.15113

Optimal Toffoli-Depth Multi-Controlled Toffoli Decomposition in 2D Qubit Layout

作者:

Anik Basu Bhaumik ↗Suman Dutta ↗Anupam Chattopadhyay ↗

arXiv:2606.15113v1 Announce Type: new Abstract: The multi-controlled Toffoli (MCT) gate is a key primitive in quantum arithmetic, oracle construction, and quantum cryptanalysis. Although recent work has established optimal Toffoli-depth MCT decompositions under all-to-all qubit connectivity, their realization on near-term quantum hardware with restricted qubit connectivity remains largely unexplored. While general-purpose quantum mappers can route arbitrary circuits, they do not explicitly exploit the repeated interaction patterns inherent in MCT decompositions. In our present paper, we study architecture-aware mappings of optimal Toffoli-depth MCT decompositions onto restricted two-dimensional qubit layouts. We begin with a structured geometric placements that preserve the parallelism of state-of-the-art Toffoli and MCT decompositions with no additional depth overhead. We further introduce a motif-based packing framework in which decomposition layers are represented by interaction motifs derived from basic Toffoli gates. By embedding these motifs vertex-disjointly into hardware graphs, we characterize the minimum-size topologies supporting the required qubit resources and derive explicit bounds on the resulting depth overhead under tight qubit budgets. Finally, we compare these bounds with routing-aware placement heuristics and empirically evaluate the effectiveness of embedding different motifs across a range of hardware topologies.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2606.12903

X-MADAM-RAG: Diagnosing and Handling Chinese-English Evidence Conflict in Retrieval-Augmented Generation

作者:

Yongqi Kang ↗Yu Fu ↗Yong Zhao ↗

Retrieval-augmented generation (RAG) systems may receive evidence that is not merely noisy but mutually contradictory. This issue becomes particularly salient in multilingual settings, where retrieved Chinese and English evidence may support incompatible answer candidates. We study this problem through X-RAMDocs-ZHEN, a controlled Chinese-English benchmark derived from RAMDocs for diagnosing evidence conflict in RAG. The benchmark contains 300 examples across six balanced conditions, including monolingual support, bilingual agreement, reversed conflict directions, and conflict with optional noise. We further examine X-MADAM-RAG, an interpretable pipeline that decomposes evidence handling into per-document candidate extraction, visible-evidence repair, deterministic candidate grouping, and conflict-aware aggregation. On the original controlled benchmark with Qwen2.5-7B-Instruct, X-MADAM-RAG achieves 0.9667 strict accuracy and 0.9767 conflict-aware success, outperforming an evidence-normalized single-call baseline. However, a zero-call rule-only extractor reaches 1.0000 on the same benchmark, revealing strong template regularity. To probe this limitation, we construct a deterministic naturalized stress test that removes explicit answer templates while preserving candidate strings. On its 100-sample subset, rule-only extraction falls to 0.0000, but X-MADAM-RAG also drops to 0.3000 strict accuracy, below both naive and evidence-normalized baselines. A privileged oracle remains perfect, indicating that document-level extraction is the main bottleneck. These findings position X-RAMDocs-ZHEN and X-MADAM-RAG as diagnostic tools for controlled evidence conflict rather than as evidence of general hallucination detection or robustness to natural retrieval.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.12260

Market Design for AI: Beyond the Copyright Binary

作者:

Yan Dai ↗Maryam Farboodi ↗Negin Golrezaei ↗Sepehr Shahshahani ↗

arXiv:2606.12260v1 Announce Type: cross Abstract: How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and – by modeling as a static Stackelberg game – strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance – a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络