论文广场 - AcademicHub

01.

arXiv (math.PR) 2026-06-18 DOI: arXiv:2508.17116

A scaling limit theorem for controlled branching processes with a size-divisible term

作者:

Miguel Gonz\'alez ↗Pedro Mart\'in-Ch\'avez ↗In\'es del Puerto ↗

arXiv:2508.17116v2 Announce Type: replace Abstract: This paper establishes general sufficient conditions for a sequence of controlled branching processes to converge weakly on the Skorokhod space. We focus on a class of control mechanisms that extend previous results by decomposing those random variables into the sum of two independent components: an immigration term, which depends on the current population size, and a size-divisible term, which can be expressed as the sum of random contributions from each individual. This extension allows us to capture a broad range of control functions including Poisson, binomial, and negative binomial distributions, commonly used in the literature. The assumptions are formulated in terms of probability generating functions of the offspring and control laws, distinguishing in this latter between the immigration and the size-divisible parts. The limit process is shown to be a continuous-state branching process with dependent immigration. The proof essentially relies on tightness arguments and the identification of a martingale problem. We also identify the special case in which the limit reduces to a classical Feller branching diffusion with immigration.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.19047

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

作者:

Ruishan Fang ↗Siyuan Lu ↗Chenyi Zhuang ↗Tao Lin ↗

arXiv:2606.19047v1 Announce Type: new Abstract: Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. We observe that the gradient signal in GRPO concentrates on tasks with the highest rollout reward variance, a consequence of the Popoviciu upper bound. Consequently, samples near the agent's capability boundary – where successes and failures are roughly balanced – contribute disproportionately large policy gradients. As training progresses, this boundary continuously shifts, which gradually depletes the pool of informative samples in a static dataset. We propose RODS (Reward-driven Online Data Synthesis) to resolve this depletion. RODS closes the loop between RL training and data generation by repurposing the progress reward variance as a practical, zero-cost boundary detector that requires no extra inference beyond the rollouts already computed for training. It continuously identifies such boundary samples, synthesizes new multi-turn variants matching their structural complexity (e.g., API topology and dependency depth) via a skill-aligned resampling pipeline, and manages a dynamic replay buffer that co-evolves with the policy. Starting from 400 human seeds and maintaining an active training pool of ~800 samples, RODS achieves comparable performance to a 17K-sample offline pipeline while requiring roughly 20x fewer trajectories, and improves over fixed-data RL and environment augmentation in our controlled setting.

阅读与讨论 → 访问原文 →

03.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2507.07092

Non-Gaussian Phase Transition and Cascade of Instabilities in the Dissipative Quantum Rabi Model

作者:

Mingyu Kang ↗Yikang Zhang ↗Kenneth R. Brown ↗Thomas Barthel ↗

arXiv:2507.07092v3 Announce Type: replace Abstract: The open quantum Rabi model describes a two-level system coupled to a harmonic oscillator. A Gaussian phase transition for the nonequilibrium steady states has been predicted when the bosonic mode is soft and subject to damping. We show that oscillator dephasing is a relevant perturbation, which leads to a non-Gaussian phase transition and an intriguing cascade of instabilities for $k$-th order bosonic operators, as well as a jump in the steady-state qubit polarization. For the soft-mode limit, the equations of motion form a closed hierarchy and spectral properties can be efficiently studied. To this purpose, we establish a fruitful connection to non-Hermitian Hamiltonians. The results for the phase diagram, stability boundaries, and relevant observables are based on mean-field analysis, exact diagonalization, perturbation theory, and Keldysh field theory.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.18135

Descriptor: Certus Caliber Classification Gunshot Dataset (C3GD)

作者:

Sinclair Gurny ↗Ryan Quinn ↗

arXiv:2606.18135v1 Announce Type: cross Abstract: In this work, we introduce the Certus Caliber Classification Gunshot Dataset (C3GD), a publicly accessible data set developed for the analysis of firearm muzzle blast sounds. The dataset aims to provide a wide variety of firearms, calibers, cartridges, microphones, and microphone locations with metadata detailed beyond what is currently otherwise available. It comprises more than 8000 field-collected data points from 28 firearms across 16 calibers. Because data collection in the field is costly, much of the existing research has been done using gunshot audio collected from the internet, which increases the risk of low-quality data and label noise. This dataset is primarily focused on caliber classification, but can also be used for gunshot detection, audio separation, and audio signal processing, providing a diversified and real-world reference. The dataset aims to provide enough diversity to be able to generalize to more real-world applications while also providing enough metadata for detailed academic analysis.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.11542

Pretrained self-supervised speech models can recognize unseen consonants

作者:

Chihiro Taguchi ↗\'Eric Le Ferrand ↗Hirosi Nakagawa ↗Hitomi Ono ↗Kanji Kato ↗Emily Prud'hommeaux ↗David Chiang ↗

Modern pretrained self-supervised automatic speech recognition models are trained on large-scale audio data to encode speech into contextualized representations. However, their training data are heavily skewed toward high-resource languages with little data from low-resource languages, raising concerns about the potential underrepresentation of typologically uncommon speech sounds such as click consonants primarily found in Khoisan languages. This leads to our central research question: Can these models recognize click consonants as accurately as other speech sounds? To address this question, we fine-tune and compare pretrained self-supervised speech models (Wav2Vec2 and HuBERT) on data from two click-rich Khoisan languages (G|ui and West !Xoon). Our results reveal that the fine-tuned models consistently recognize clicks more accurately than non-clicks, suggesting that self-supervision enables generalization across human speech sounds including rare phonemes.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.AI) 2026-06-12 DOI: arXiv:2606.12479

ReCal: Reward Calibration for RL-based LLM Routing

作者:

Qihang Yu ↗Hanwen Tong ↗Zhengqi Zhang ↗Bo Zheng ↗Feng Wei ↗Shengyu Zhang ↗Zemin Liu ↗Fei Wu ↗

arXiv:2606.12479v1 Announce Type: cross Abstract: Large language model (LLM) routing has emerged as an effective paradigm for leveraging the complementary strengths of multiple LLMs through dynamic model and reasoning-strategy selection. Recent reinforcement learning (RL)-based routing methods further improve routing quality by optimizing routing policies from interaction feedback. However, they still struggle to provide informative and comparable learning signals under heterogeneous tasks with varying difficulty. In practice, multiple objectives (e.g., correctness, format behavior) are aggregated into a single scalar reward, leading to ambiguous credit assignment and conflicting optimization signals. Moreover, reward signals exhibit significant variability across instances, where some instances produce higher or more variable rewards, introducing optimization bias that favors trivial samples over informative ones. To address these issues, we propose ReCal, a \underline{Re}ward \underline{Cal}ibration framework for RL-based LLM routing. We first introduce a hierarchical reward decomposition mechanism with component-wise advantage estimation. We further propose a distribution-aware optimization strategy that calibrates optimization variability through variance-aware reweighting and per-dataset normalization. Experiments on seven datasets demonstrate that ReCal consistently improves routing performance, and training stability over baselines. Code is available at https://anonymous.4open.science/r/ReCal.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.16868

Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection

作者:

Markus Bujotzek ↗Dimitrios Bounias ↗Stefan Denner ↗Ralf Floca ↗Maximilian Fischer ↗Peter Neher ↗Klaus Maier-Hein ↗

While federated learning (FL) enables collaborative medical image segmentation without centralizing sensitive data, real-world deployment is frequently complicated by cross-site label imperfections such as contour disagreement, missing or additional structures, and confused labels. Federated noisy label learning (FNLL) aims to mitigate these effects, yet remains underused in practice as existing evidence is largely based on synthetic noise, simplified settings, and limited real-world noisy evaluation. We address this gap by introducing a benchmark suite that combines diverse real-world noisy datasets, deployment-relevant client-noise scenarios, and label-noise-targeted evaluation to support systematic FNLL assessment and informed method selection. The suite combines curated real-world noisy medical image segmentation datasets from diverse sources with a comprehensive federated segmentation framework including various client-noise scenarios and noise-targeted evaluation. The presented suite provides a realistic and discriminative basis for FNLL evaluation in medical image segmentation and establishes a reusable foundation for fair benchmarking, dataset-specific label-noise characterization, and future method development under realistic federated settings. Code is available at https://github.com/MIC-DKFZ/FedSegNoiseBench.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2510.23508

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

作者:

Jiahui Geng ↗Jonathan Tonglet ↗Iryna Gurevych ↗

Existing real-world datasets for multimodal fact-checking have multiple limitations: they contain few instances, cover on only one or two languages, focus only on one task, or rely on external news article sets for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent a diverse range of cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake image detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks affects verdict prediction performance. We make our dataset and code publicly available.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.19727

NRITYAM: Language Models Meet Art and Heritage of Dance

作者:

Punit Kumar Singh ↗Niladri Ghosh ↗Advait Joshi{\i}nst ↗Shailee Choudhary ↗Michael F\"arber ↗Haiqin Yang ↗

Language models have become essential tools in shaping modern workflows. However, their global effectiveness hinges on a nuanced understanding of local socio-cultural contexts. To address this gap, we present NRITYAM, a comprehensive benchmark for evaluating the cultural comprehension capabilities of language models in the context of global dance traditions. NRITYAM comprises 9,260 carefully curated question-answer pairs spanning 12 languages, making it the largest dataset dedicated to evaluating cultural knowledge in dance. The dataset has been developed from the ground up through close collaboration with native dance artists and native speakers of the languages, who authored and validated culturally relevant questions specific to their regions. We evaluate a broad set of models, including large language models, small language models, multimodal large language models, and small multimodal language models. As a multilingual and multicultural benchmark, NRITYAM sets a new standard for evaluating the ability of AI systems to understand and reason about traditional performing arts. Detailed dataset samples are available at~\url{https://github.com/niladrighosh03/NRITYAM}.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2605.21629

Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

作者:

Sina Rismanchian ↗Hasan Uzun ↗Jeffrey Matayoshi ↗Eric Cosyn ↗Eyad Kurd-Misto ↗

arXiv:2605.21629v2 Announce Type: replace-cross Abstract: How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for investigating time-on-task, complemented by ALEKS PPL placement-assessment data for examining proctoring and learning outcomes, with a quasi-experimental design exploiting variation in tasks that are more susceptible to AI (text-based word problems) and less susceptible to AI (interactive graph-based problems). Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. Among college students, the post-ChatGPT divergence vanishes entirely under proctoring, ruling out broad efficiency gains as the likely explanation. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase – inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build – the population-level indicator of cognitive surrender, with direct implications for educational research, assessment governance, and AI policy.

阅读与讨论 → 访问原文 →

11.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2606.20535

Near-Optimal Learning of Local Lindbladians

作者:

Itai Arad ↗Zhili Chen ↗Naixu Guo ↗Patrick Rebentrost ↗Zhan Yu ↗

arXiv:2606.20535v1 Announce Type: new Abstract: We study the problem of learning local Lindbladians from black-box access to the physical evolution, and the goal is to estimate all Hamiltonian and dissipative coefficients. We give an algorithm built directly from finite-time channel probes, which runs the unknown evolution for short times, estimates the corresponding Pauli transfer matrices from classical shadows, and converts these estimates into Lindbladian coefficients by stable local Fourier inversions. For fixed locality and bounded dissipative site degree, the uses of the dynamical evolution and total evolution time scale as $\widetilde{O}(\Lambda^2/\varepsilon^2)$ and $\widetilde{O}(\Lambda/\varepsilon^2)$ respectively, in the local dynamical strength bound $\Lambda$ and target accuracy $\varepsilon$, with only logarithmic dependence on the number of qubits. The algorithm is non-adaptive, uses no ancillas, and uses only random product states as inputs followed by random Pauli measurements. The method does not require knowing the support of the Lindbladian in advance. We complement the algorithm with matching lower bounds, showing that the learning algorithm is near-optimal both in physical dynamics accesses and in total evolution time. We construct a single-qubit dephasing Lindbladian family that already requires $\Omega(\Lambda^2/\varepsilon^2)$ channel uses and $\Omega(\Lambda/\varepsilon^2)$ total evolution time, even for adaptive algorithms with arbitrary ancillas and measurements. In particular, the lower bounds imply that the Heisenberg-limited scaling achievable for Hamiltonian learning is information-theoretically impossible once dissipative coefficients must be estimated.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16523

SkillWiki: A Living Knowledge Infrastructure for Agent Skills

作者:

Dingcheng Huang ↗Yuda Ding ↗Bingshuo Liu ↗Qingbin Liu ↗Xi Chen ↗Jiang Bian ↗Hongliang Sun ↗Zhiying Tu ↗Dianhui Chu ↗Xiaoyan Yu ↗Dianbo Sui ↗

While knowledge is managed through Wikipedia and software through GitHub, agent skills still lack an infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure that supports the organization, grounding, and continuous evolution of agent skills by transforming heterogeneous knowledge into reusable skill assets linked to their originating evidence. Our demonstration presents the complete skill lifecycle, from knowledge ingestion and skill production to provenance-aware exploration, governance, and execution-driven evolution. SkillWiki highlights a future in which knowledge, skills, and execution experience co-evolve within a shared infrastructure. The live demonstration and source code are publicly available at https://github.com/Huangdingcheng/SkillWiki.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.19468

Characterizing Narrative Content in Web-scale LLM Pretraining Data

作者:

Teagan Johnson ↗Elliott Ash ↗Andrew Piper ↗Maria Antoniak ↗

The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of narrative features in Dolma, a 3-trillion-token open pretraining corpus. Drawing on narrative theory, we design a framework spanning three core narrative elements (agency, setting, and events) operationalized as 11 interpretable dimensions. After sampling and annotating a diverse set of 400 passages, we finetune and validate NarraBERT, a RoBERTa-based model for fine-grained narrative prediction. We apply NarraBERT to 3M passages, resulting in a new dataset, NarraDolma. We find (i) narrative structure is measurable at scale across extremely heterogeneous data, (ii) we uncover a continuous, multidimensional narrative structure underlying web text, and (iii) narrative qualities are unequally distributed across pretraining sources and topics in ways that current curation practices neither measure nor account for. Our framework, dataset, and analyses provide a foundation for understanding how narrative qualities are distributed in LLM pretraining data and for studying how data composition affects narrative reasoning tasks. We publicly release NarraDolma and NarraBERT.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.20477

Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology

作者:

Yusuf Salcan ↗Simon Ging ↗Robin Schirrmeister ↗Philipp Arnold ↗Elmar Kotter ↗Behzad Bozorgtabar ↗Thomas Brox ↗

We study how to train visually grounded vision-language models (VLMs) for radiology without manual spatial annotations. We introduce RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with task-specific VQA and spatial grounding subsets generated automatically via LLM-based curation and automated segmentation. Trained on this data, our model RadGrounder jointly performs report generation, visual question answering, and spatial grounding via bounding-box detection or segmentation. On external VQA benchmarks (Slake, VQA-RAD), RadGrounder achieves competitive results with specialized medical VLMs. Adding our clinical data to the training mixture improves open-ended VQA over fine-tuning on the downstream datasets alone, showing the transferability of our dataset. Crucially, adding grounding supervision does not degrade language quality, enabling spatially verifiable outputs at no cost to VQA performance.

阅读与讨论 → 访问原文 →

15.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2605.23380

Lowest order Carleman linearization for low Reynolds long-term behaviour of fluid flow simulations

作者:

Luca Cappelli ↗Sauro Succi ↗

arXiv:2605.23380v2 Announce Type: replace Abstract: It is shown that the lowest (second) order truncation of the Carleman linearization of the fluid equations (C2) recovers the late stage of the evolution, namely the steady-state solution, although to a decreasing degree of accuracy at increasing Reynolds number. This asymptotic property is first proved analytically for the decaying logistic with external forcing and then shown to hold to a significant degree of accuracy also for the more complex case of two-dimensional Kolmogorov-like fluid flow at low Reynolds numbers, below $Re \sim 10$. This time-asymptotic property may open interesting prospects for the quantum simulation of low-Reynolds steady-state fluid flows.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19387

Interpretable and Verifiable Hardware Generation with LLM-Driven Stepwise Refinement

作者:

You Li ↗Samuel Mandell ↗David Z. Pan ↗

arXiv:2606.19387v1 Announce Type: cross Abstract: Large language models (LLMs) have achieved remarkable success in software development. However, they are susceptible to hallucinations, meaning that they can introduce subtle semantic and logical errors. Due to the high stakes in chip design and manufacturing, hardware engineers are still reluctant to rely on LLMs for register-transfer level (RTL) generation. In this paper, we propose a hardware generation framework that combines the creativity and broad knowledge of LLMs with the explainability and mathematical rigor of formal methods. Specifically, we devise a set of transformation rules that cover various design decisions and hardware features. By iteratively applying these rules, an LLM agent can convert a design specification into an RTL program with guaranteed correctness. Experimental results demonstrate the effectiveness and efficiency of the framework.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2606.17437

Spatio-Temporal Fusion Model for Standard View Classification of Echocardiographic Videos

作者:

Bo Gou ↗Jicheng Zhang ↗Jianlong Xiong ↗Tao He ↗Bentian Liu ↗Hai Wu ↗Yijiao Wang ↗Yu Zhang ↗Yujia Yang ↗Yun Dai ↗Jian Liu ↗Jie Wang ↗…

Automated classification of standard echocardiographic views is crucial for efficient clinical workflow but faces three main challenges. First, publicly available datasets are scarce and limited in scale and view coverage. Second, the performance of some modern video-level architectures for echocardiographic view classification remains underexplored. Third, some view categories exhibit highly similar spatial appearances, making single-frame features insufficient for discrimination, while heterogeneous frame quality complicates robust temporal information fusion. To address these challenges, we release the Echocardiographic Videos of Nine Views (EV9V) dataset, comprising 5,138 videos, 910,579 frames, and 9 standard views, which is, to the best of our knowledge, the largest publicly available echocardiography video dataset. Using EV9V, we systematically benchmark representative video classification architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. Furthermore, we propose a Spatio-Temporal Fusion Model (STFM), an efficient dual-stream CNN-LSTM (Long Short-Term Memory) framework that jointly captures spatial anatomical structures and temporal cardiac dynamics. The proposed framework leverages uncertainty-aware learning to preferentially sample representative video segments during training and evidence-based fusion during inference, improving robustness to variations in frame quality across echocardiographic videos. Extensive experiments demonstrate that our method achieves competitive performance across diverse video classification models, validating the effectiveness of uncertainty-aware spatio-temporal learning for echocardiographic view classification. The code is available at https://github.com/bgx666/stfm.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.20523

SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm

作者:

Sol\`ene Debuys\`ere ↗Nicolas Trouv\'e ↗Nathan Letheule ↗Elise Colin ↗Georgia Channing ↗

arXiv:2606.20523v1 Announce Type: cross Abstract: Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable resources for synthetic aperture radar (SAR) remain limited. Existing SAR–optical datasets largely rely on low-resolution, intensity-only Ground Range Detected~(GRD) products and do not preserve complex-valued SAR measurements or native acquisition geometry, which restricts physically grounded multimodal learning. In particular, large-scale public datasets combining very-high-resolution (VHR) SAR SLC, aligned optical imagery, and natural-language descriptions are still lacking. We present a VHR SAR–optical–text dataset built from open-access Umbra spotlight acquisitions distributed as Sensor Independent Complex Data (SICD). From around 2,500 worldwide scenes (VV/HH, 20cm–2m native resolution), we standardize all SAR data to an 80cm slant-range grid via band-limited FFT resampling and tile the imagery into 1024 by 1024 patches. For each SAR patch, we retrieve a high-resolution optical tile and warp it into the SAR grid using local coordinate correspondences for local pixel-level alignment. We further generate three caption variants (SHORT/MID/LONG) per sample to support vision–language training and evaluation. Our dataset contains 119,566 triplets (complex and amplitude slant-range SAR patch, aligned optical patch, natural-language description) covering 257 locations across 72 countries and a broad range of land types and infrastructures. We release fixed train/validation/test splits and the full preprocessing and baseline code to enable reproducible benchmarks for multimodal alignment on cross-modal retrieval and conditional generation in native SAR geometry. The dataset is publicly available on the Hugging Face Hub at https://huggingface.co/datasets/ONERA/SARLO-80.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.15359

DiRecT: Safe Diffusion-Based Planning via Receding-Horizon Denoising

作者:

Paolo Giaretta ↗Zeyang Li ↗Navid Azizan ↗

arXiv:2606.15359v1 Announce Type: new Abstract: Diffusion models have emerged as powerful tools for planning and control by learning multimodal distributions over actions and trajectories. Yet reliable inference-time safety enforcement remains a key barrier to their deployment in safety-critical tasks. Existing approaches typically project each denoising iterate onto the feasible set, even though constraints are defined only on the final clean trajectory. Enforcing feasibility on noisy intermediate samples can therefore overconstrain the sampling dynamics, substantially degrading sample quality. To address this limitation, we introduce DiRecT (Diffusion-based planning via Receding-horizon denoising with Terminal constraints), a training-free algorithm for constrained sampling from diffusion models via stochastic optimal control (SOC). DiRecT enforces constraints only on the final clean sample, avoiding unnecessary restrictions on the intermediate denoising dynamics. Inspired by model predictive control, we derive a principled receding-horizon surrogate for the otherwise intractable constrained SOC formulation, yielding an efficient algorithm that cleanly separates stochastic denoising from constraint satisfaction, progressively steering samples toward feasible final trajectories without distorting the learned diffusion dynamics. Furthermore, DiRecT is highly flexible: it can leverage off-the-shelf or domain-specific optimizers, incorporate priors over environment dynamics, and optimize additional soft rewards. Extensive experiments on safe planning benchmarks demonstrate that DiRecT substantially improves deployment safety and task performance over existing diffusion-based planning baselines.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15370

MNet++: Extended 2D/3D Networks for Anisotropic Medical Image Segmentation

作者:

Kirsten Odendaal ↗Rade Bajic ↗

This work demonstrates a full reproduction and extension of MNet, a hybrid 2D/3D convolutional network designed for anisotropic medical image segmentation. The original architecture was re-implemented within the nnU-Net framework to verify its reported performance and robustness to variable voxel spacing, known as anisotropy. Experiments were conducted on PROMISE prostate MRI and a controlled subset of LiTS liver CT under matched preprocessing and compute constraints. The reproduced MNet achieved a Dice similarity coefficient (DSC) of 89.0 +/- 0.9% on PROMISE, within 0.8% of the published result, and 94.3 +/- 1.9% / 54.6 +/- 3.1% for liver and tumor segmentation on LiTS, respectively. Two lightweight extensions were further introduced: (1) a learned Fusion Gating mechanism enabling adaptive 2D-3D feature blending, and (2) a VMamba state-space module for efficient long-range depth modelling. The Spatial Gating variant improved DSC by +0.8% with less than 3% inference overhead, while VMamba improved performance consistency, reducing PROMISE Dice variation to +/- 0.7% and achieving the strongest LiTS liver performance at 95.8% Dice. Both extensions preserved MNet robustness to anisotropy, with delta Dice = 1.5% across 1-4 mm voxel spacing. Overall, the study confirms MNet reproducibility and demonstrates that adaptive fusion and state-space modelling have the potential to further strengthen segmentation reliability under anisotropic conditions. However, further tests are required to provide definitive conclusions.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.11573

Understanding Cross-Sensor Feature Variations for Generalizable 3D Perception

作者:

Xin Qiu ↗Wenjie Liu ↗Fuyuan Ai ↗YuChen Tan ↗Zhiwei Xu ↗Chunyi Song ↗

Radar-camera BEV perception often suffers from degraded performance when evaluated across datasets, as changes in driving scenes, sensor configurations, and environmental conditions can alter both the input observations and the internal fused representations. This work studies this issue from the perspective of source-domain variation modeling, aiming to improve the robustness of BEV-based 3D detectors without relying on target-domain samples. We introduce a framework that characterizes visual scene variations in the frequency domain and uses them to synthesize diverse source-domain views. By comparing the resulting fused BEV representations, the framework further captures how image-level variations influence multi-modal BEV features. These variation patterns are then used to regularize the detector, encouraging the learned fusion space to remain stable under latent scene changes. The proposed method is applied only during training and leaves the inference pipeline unchanged. Experiments on cross-dataset radar-camera 3D detection between View-of-Delft and TJ4DRadSet demonstrate consistent improvements over multiple BEV fusion backbones, and the gains remain effective when a small amount of target-domain data is available.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2606.12740

Deep Unfolded Latent Optimally Partitioned-l2/l1 Networks for Data-driven Block-Sparse Recovery

作者:

Takanobu Furuhashi ↗Hidekata Hontani ↗Qibin Zhao ↗Tatsuya Yokota ↗

arXiv:2606.12740v1 Announce Type: new Abstract: The convex Latent Optimal Partition (LOP)-l2/l1 approach enables block-sparse signal recovery with unknown partitions but relies on manual hyperparameter tuning. Additionally, numerical instability in differentiating its proximal operator prevents its automatic parameter tuning via Deep Unfolding (DU). To address these limitations, we propose two architectures: a stable framework utilizing implicit differentiation and a flexible variant leveraging Deep Weight Factorization (DWF). The DWF-based approach also supports nonconvex smooth data fidelity terms. Numerical experiments demonstrate that DU-LOP-l2/l1 yields competitive performance and high resilience against impulsive noise.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2405.15768

Canonical Variates in Wasserstein Metric Space

作者:

Jia Li ↗Lin Lin ↗

arXiv:2405.15768v2 Announce Type: replace-cross Abstract: In this paper, we address the classification of instances represented by distributions on a vector space rather than single points. We consider classification algorithms based on pairwise distances, specifically, the Wasserstein metric between distributions. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, both between-class and within-class variations are defined as the average squared Wasserstein distances between pairs of distributions, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. Empirical studies are conducted to assess the algorithm's convergence; and experimental results demonstrate that the dimension reduction technique substantially enhances classification performance. Moreover, the new method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness to variations in how instances are summarized by distributions, such as the number of components in a Gaussian mixture model (GMM) representation.

阅读与讨论 → 访问原文 →

24.

medRxiv (Medicine) 2026-06-22 DOI: HASH:4a421ce7950d3b5b8848335d31f575b8

Characteristics and Outcomes of Gene-Elusive Dilated Cardiomyopathy

作者:

Cannie ↗Bakalakos ↗Syrris ↗Protonotarios ↗Lorenzini ↗Guttmann ↗O'Mahoney ↗Savvatis ↗Sekhri ↗Mohiddin ↗S. A ↗Kaski ↗…

Background and Aims Genetic testing in dilated cardiomyopathy (DCM) guides risk stratification and family screening. Likely pathogenic or pathogenic (LP/P) variants are identified in approximately one-third of patients, leaving many without a genetic diagnosis. Cohort studies suggest that "gene-elusive" patients have a lower risk of adverse events. This study aims to better characterise this group and identify factors associated with adverse outcomes. Methods Consecutive and unrelated DCM patients undergoing genetic testing and returning no LP/P variants were retrospectively recruited and compared to two control cohorts of DCM patients carrying LP/P variants in LMNA and TTN for a primary composite endpoint of end-stage heart failure (ESHF) or malignant ventricular arrhythmia (MVA). Results Among patients without prior MVA, the composite endpoint occurred in 36/423 (8.5%) gene-elusive, 14/39 (35.9%) LMNA and 11/100 (11%) TTN cardiomyopathy patients (log-rank p

阅读与讨论 → 访问原文 →

25.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2606.11490

OmniLoc: A Geometry-Aware Foundation Model for Anchor-Free UE Localization Across Diverse Indoor Environments

作者:

Lei Chu ↗Yuning Zhang ↗Omer Gokalp Serbetci ↗Anushka Katiyar ↗Bassel Abou Ali Modad ↗Andreas F. Molisch ↗

arXiv:2606.11490v1 Announce Type: new Abstract: Indoor localization from wireless measurements remains challenging in large-scale deployments due to substantial variation in building geometry, the set of detectable access points (APs), and the heterogeneity of received signals. Existing learning-based methods often perform well only in limited settings and degrade under environmental shifts, making robust anchor-free localization across diverse indoor environments notoriously difficult. In this paper, we present OmniLoc, an environment-interactive foundation model for anchor-free user equipment localization across diverse indoor environments. To the best of our knowledge, OmniLoc is the first foundation-model-based approach built directly on wireless measurements for this task. OmniLoc is built on three key designs. First, a unified input tokenization module converts heterogeneous wireless measurements into a common representation that is more amenable to learning. Second, a geometry-aware Transformer performs AP-aware feature extraction by emphasizing dominant APs while aggregating complementary evidence from supporting APs. Third, a geometry-aware location estimation module conditions regression on geometric embeddings to produce geometrically consistent location predictions. We evaluate OmniLoc on both a large-scale in-house dataset and a public benchmark dataset. Results show that OmniLoc significantly outperforms existing methods, consistently improves existing backbones when its design components are integrated, and demonstrates strong generalization in cross-environment evaluations.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络