论文广场 - AcademicHub

01.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19629

RIVET: Robust Idempotent Voice Attribute Editing

作者:

Dareen Alharthi ↗Bhuvan Koduru ↗Rita Singh ↗Bhiksha Raj ↗

arXiv:2606.19629v1 Announce Type: cross Abstract: Voice attribute editing models modify characteristics such as age and gender while preserving speaker identity. In large-scale speech datasets, however, attribute annotations are often noisy or inconsistent, which can cause conditional generative models to produce unstable edits. In this work, we show that idempotency provides an effective mechanism for improving robustness to noisy labels. An idempotent operator is one for which repeated application does not change the result, i.e., f(f(x)) = f(x). Enforcing this property acts as an implicit regularizer that reduces sensitivity to mislabeled examples. We introduce RIVET, a training framework that incorporates an idempotency objective to improve robustness to label noise. We evaluate RIVET under controlled label noise and on the GLOBE dataset with naturally noisy annotations. RIVET improves editing success and better preserves speaker identity than standard training, showing that idempotency improves robustness in voice editing models.

阅读与讨论 → 访问原文 →

02.

medRxiv (Medicine) 2026-06-22 DOI: HASH:8931b12e773d08691f084a97cc26baff

Early-life nutritional environment is associated with late-life cognition in the Health and Retirement Study, a pellagra epidemic natural experiment

作者:

Vasiljevic ↗Schmitz ↗L. L ↗Engelman ↗C. D ↗

Early-life exposures are important to several late-life health outcomes. We sought to study the effect of an in utero nutritional environment and its interaction with Alzheimer's disease (AD) genetic risk on late-life cognitive function. We used a natural experiment created by the pellagra epidemic, a nutritional disease caused by a vitamin B3 deficiency, to evaluate the association between in utero pellagra epidemic exposure and late-life cognitive function in the Health and Retirement Study (N = 18,285). We also evaluated whether the in utero exposure could modify the AD polygenic score's (PGS) effect on cognition. In utero pellagra epidemic exposure was significantly associated with cognition ({beta} = -0.025). However, these effects were not isolated to the prenatal period as exposure during childhood periods also had an effect. The interaction between the in utero exposure and the AD PGS was significant, where the genetic effect on cognition was amplified with increasing (progressively worse) in utero exposure levels. These associations imply that the early-life nutritional environment affects late-life cognitive function and that these effects can modify genetic risk.

阅读与讨论 → 访问原文 →

03.

medRxiv (Medicine) 2026-06-10 DOI: HASH:0047cf0db5016f9dddd52fce35e562e9

Trajectories of brain structure and function in young adult carriers of genetic frontotemporal dementia variants

作者:

So ↗Lombardi ↗Staffaroni ↗A. M ↗Coleman ↗Bouzigues ↗Ferry-Bolder ↗Cullen ↗Russell ↗Foster ↗Farley ↗Convery ↗…

Background and Objectives: Converging evidence hints at neurodevelopmental effects in genetic frontotemporal degeneration (FTD). In cross-sectional studies, for some genes, young adult FTD variant carriers show differences in brain volumes and cognition compared to familial non-carriers. However, longitudinal trajectories may more sensitively capture FTD-related neurodevelopmental vs. neurodegenerative changes than cross-sectional approaches. This study examined longitudinal trajectories of brain volumes, executive function, and plasma biomarkers in young adult carriers compared to familial non-carriers, as measures of neurodevelopmental and neurodegenerative outcomes of FTD-causing variants. Methods: This longitudinal cohort study comprised participants, aged 18-30 years, from the FTD Prevention Initiative across Europe, Canada, and the USA. Genetic groups included C9orf72 (47%), MAPT (30%), and GRN (23%). Linear mixed-effects models were computed to assess longitudinal outcomes across age between groups, controlling for sex, scanner (for brain volumes), and education (for executive function); random effects accounted for between-subject variability nested within family membership. Results: Variant carriers (n=147) and familial non-carriers (n=113) did not differ in age (mean{+/-}SD, 25.9{+/-}3.2 years), sex (53% female), or number of visits (2.1{+/-}1.7). Young adult C9orf72 repeat expansion carriers exhibited smaller thalamic volumes than non-carriers at the reference age of 26 years (b=-982.8mm3, SE=317.0, p=0.0046, f2=0.32), with relatively stable trajectories across ages 18-30 (i.e., no change over time). Trajectories of rostral anterior cingulate volumes differed in C9orf72 carriers and non-carriers across age, where carriers showed relatively stable trajectories and non-carriers showed age-appropriate declines (b=64.4mm3, SE=29.9, p=0.035, f2=0.07). For MAPT and GRN, there were little to no differences in total brain, cortical, or subcortical volumes between groups and over time. No longitudinal differences were observed between carriers and non-carriers in executive function, or plasma NfL or GFAP for any genetic group. Discussion: C9orf72 repeat expansions were linked to smaller average thalamic volumes and stable trajectories between ages 18 to 30, supporting potential neurodevelopmental origins. The modest evidence supporting an absence of difference in neurodegenerative biomarkers and executive function suggests minimal early neurodegeneration and functional preservation in young adulthood.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16897

Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures

作者:

Xueping Gao ↗

Do different LLM architectures encode high-level concepts in structurally compatible ways? We systematically characterize a geometric-functional universality dissociation: across multiple concept domains and architectural families, moderate geometric convergence coexists with near-perfect functional transfer. Using contrastive-difference CKA (CKA_Delta), a training-free diagnostic that computes kernel alignment on per-sample contrastive differences, we isolate concept-specific convergence from generic similarity – achieving significant discrimination where standard CKA cannot. The dissociation replicates across all six concept domains we test (five with p =70B models. We position CKA_Delta as a practical regime classifier and architectural outlier detector (Gemma: d = 1.08, AUC = 0.79) rather than an absolute transfer-accuracy predictor, providing a training-free diagnostic for cross-architecture concept monitoring.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2604.04917

Vero: An Open RL Recipe for General Visual Reasoning

作者:

Gabriel Sarch ↗Linrong Cai ↗Qunzhong Wang ↗Haoyang Wu ↗Danqi Chen ↗Zhuang Liu ↗

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) suggest that broad visual reasoning is within reach, yet their closed data and reinforcement learning (RL) pipelines make their gains difficult to study, reproduce, or extend. We introduce Vero, a family of fully open VLMs that match or exceed existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answers. Across VeroEval, our 30-benchmark suite, Vero-600K outperforms existing RL datasets under controlled comparisons. Applied to five starting models, Vero variants gain 2.9-5.4 points on average over their initial models. Notably, Vero-Qwen3I-8B, trained on the Instruct model, surpasses Qwen3-VL-8B-Thinking by 3.8 points on average without additional distillation. Systematic ablations reveal that different task categories elicit distinct reasoning patterns and that broad gains depend on learning them jointly rather than in isolation. All data, code, and models are publicly available.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16429

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

作者:

Zhongzhu Zhou ↗Qingyang Wu ↗Junxiong Wang ↗Mayank Mishra ↗Shuaiwen Leon Song ↗Ben Athiwaratkun ↗Chenfeng Xu ↗

Hybrid linear attention models offer an appealing path to faster long-context inference: they reduce the quadratic cost and KV-cache burden of full softmax attention while retaining much of the quality of Transformer models. A practical way to obtain such models is to convert a pretrained Transformer instead of pretraining a new architecture from scratch, but this conversion is still brittle. Simply copying the teacher attention projections into a Gated DeltaNet (GDN) student does not specify the new recurrent decay, write, and output-gating dynamics. As a result, the converted model often starts in a poor dynamical regime and must spend many distillation tokens repairing initialization rather than learning the remaining teacher behavior. We propose Taylor-Calibrate, a lightweight initialization method for hybrid GDN students. The method uses Taylor-guided teacher attention statistics to set the value projection, memory timescale, write gates, and output gate, then applies a short per-layer alignment step to match each converted layer to the teacher output. Across four teacher settings and three retained-layer policies, Taylor-Calibrate gives substantially stronger zero-shot students, with up to an 88x improvement in a representative ablation, and reaches matched recovery targets with 4.9x–9.2x fewer training tokens than naive conversion.

阅读与讨论 → 访问原文 →

07.

PLOS Computational Biology 2026-06-09 DOI: HASH:2a260f2105d195102aae73036f80f339

Retraction: Two Birds with One Stone? Possible Dual-Targeting H1N1 Inhibitors from Traditional Chinese Medicine

作者:

The PLOS Computational Biology Editors ↗

by The PLOS Computational Biology Editors

阅读与讨论 → 访问原文 →

08.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2606.17403

Bridging Spatial And Frequency Views For Disaster Assessment: Benefits And Limitations

作者:

Shikha V. Chandel ↗Yadav Raj Ghimire ↗Timothy Agboada ↗Leila Hashemi-Beni ↗

Rapid assessment of building damage from satellite imagery is essential for effective disaster response and recovery. While most deep learning methods rely on spatial-domain features, frequency-domain representations can capture complementary structural cues such as debris patterns and collapse-induced textures. This study presents a controlled comparison of spatial-domain, frequency-domain, and dual-domain deep learning approaches for multi-class building damage classification using post-disaster imagery from the xView2 (xBD) dataset. To ensure fairness, all models are built on an EfficientNet-B0 backbone and trained under identical settings, differing only in their input representations and fusion strategies. Performance is evaluated using accuracy, macro F1-score, per-class metrics, and confusion matrices. Results show that dual-domain models provide measurable improvements over single-domain approaches. The dual spatial configuration achieves the highest test accuracy (0.4688) and lowest loss, while the spatial-only model attains the best macro F1-score (0.4254), indicating more balanced class performance. In contrast, frequency-only models perform worst and exhibit overfitting, suggesting limited generalization. Despite these gains, all models struggle to detect subtle damage levels, particularly the Minor class, due to class imbalance and fine-grained visual ambiguity. While dual-domain approaches improve detection of severe damage, challenges remain. These findings highlight the benefits and limitations of hybrid representations and motivate future work on data balancing, advanced fusion, and regularization.

阅读与讨论 → 访问原文 →

09.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2606.19678

Operational Tube-Sector Theory of Quantum State Distinguishability Under Generalized Symmetries

作者:

Song He ↗

arXiv:2606.19678v1 Announce Type: cross Abstract: A variational principle for quantum-state distinguishability is established in many-body systems with generalized symmetries, including noninvertible cases described by fusion categories. Standard fidelity and symmetry-resolved diagnostics emerge as coarse-grained limits of a more refined operational structure. When symmetry actions terminate at entanglement cuts, distinguishability is governed by boundary tube algebras within a symmetry-constrained measurement resource theory. The physically admissible instruments are characterized by complete positivity, entanglement-cut locality, boundary-module covariance, and sequential stability. The resulting optimal measurement structure is uniquely fixed by the center of the boundary tube algebra, $\mathcal{A}_{\mathrm{phys}} = Z\!\left(\mathrm{Tube}_{\mathcal{C}}(\mathcal{M}_A)\right)$, whose primitive idempotents define tube-sector probabilities that refine fidelity-based and symmetry-resolved descriptions. The associated tube positive-operator-valued measures (POVM) are extremal and yield optimal one-shot hypothesis-testing distinguishability under symmetry constraints. The construction is universal across fusion categories and independent of microscopic realization.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.CV) 2026-06-18 DOI: arXiv:2506.11139

Grids Often Outperform Implicit Neural Representations at Compressing Dense Signals

作者:

Namhoon Kim ↗Sara Fridovich-Keil ↗

Implicit Neural Representations (INRs) have recently shown impressive results, but their fundamental capacity, implicit biases, and scaling behavior remain poorly understood. We investigate the performance of diverse INRs across a suite of 2D and 3D real and synthetic signals with varying effective bandwidth, as well as both overfitting and generalization tasks including tomography, super-resolution, and denoising. By stratifying performance according to model size as well as signal type and bandwidth, our results shed light on how different INR and grid representations allocate their capacity. We find that, for many tasks involving dense signals, a simple regularized grid with interpolation trains faster and to higher or comparable quality than any INR with the same number of parameters. We also find limited settings – namely fitting binary signals such as shape contours – where INRs outperform grids, to guide future development and use of INRs towards the most advantageous applications.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2606.14334

Riemannian Metric Matching for Scalable Geometric Modeling of Distributions

作者:

Jacob Bamberger ↗Adam Gosztolai ↗Pierre Vandergheynst ↗Michael Bronstein ↗Iolo Jones ↗

arXiv:2606.14334v1 Announce Type: new Abstract: High-dimensional datasets often concentrate near low-dimensional structures, but estimating their geometry from samples typically relies on graphs and kernels that scale poorly with dataset size and dimension. We propose Riemannian metric matching: a denoising probabilistic framework for learning the Riemannian geometry of data using neural networks. Specifically, we learn the carré du champ operator, which, using diffusion geometry, gives us access to the Riemannian geometry toolkit for downstream machine learning and statistical tasks. Our key observation is that the carré du champ operator can be formulated as a conditional expectation over random perturbations of the data, which can be exploited for sample-wise training and constant cost, amortized inference without explicit kernel construction. Empirically, metric matching rivals or improves the accuracy of $k$-NN-based diffusion geometry estimators, while enabling amortized inference that is up to $400\times$ faster, and supports graph-free geometric analysis on high-dimensional images where nearest neighbors break down.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.15127

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

作者:

Xian Sun ↗Wei Gao ↗Yingshuo Wang ↗Lingdong Kong ↗Yanhang Li ↗Zhichao Fan ↗Zexin Zhuang ↗Wenlong Dong ↗Zhiyuan Zheng ↗Hrishikesh Paranjape ↗Abhishek Mandal ↗Johnny R. Zhang ↗…

arXiv:2606.15127v1 Announce Type: new Abstract: Reasoning models are increasingly used in settings where the final answer is not the only object of review: educational tools may show students intermediate steps, decision-support systems may require human oversight, and audit workflows may inspect traces for misleading or biased input. In such settings, two responses can receive the same final-answer score while differing in whether the trace explicitly flags injected biasing content. Accuracy-only evaluation collapses these cases. We study this gap as a measurement blind spot for responsible evaluation and introduce a minimal trace-level diagnostic with two axes: susceptibility (whether the bias breaks a previously correct answer) and acknowledgment (whether the trace contains a rubric-defined surface reference to the injected content). Across thousands of biased GSM8K trials, GPT-4o and Claude Sonnet~4 have similar susceptibility rates ($1.3\%$ vs.\ $1.2\%$) but substantially different acknowledgment rates ($13.0\%$ vs.\ $75.0\%$) under the same rubric.

阅读与讨论 → 访问原文 →

13.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2606.20153

Optimizing resource allocation for accuracy in noisy variational quantum algorithms

作者:

Harshit Verma ↗Thomas Ayral ↗Alexia Auff\`eves ↗Robert Whitney ↗

arXiv:2606.20153v1 Announce Type: new Abstract: For quantum algorithms to achieve their full potential, we need methodologies to optimize them, such as reaching a given output accuracy with minimal resource costs. Here, we develop such a methodology for a class of Noisy Intermediate-Scale Quantum (NISQ) algorithms. We leverage simulations of a Variational Quantum Eigensolver (VQE) to propose a phenomenological model of such algorithms that captures the complex relationship between algorithmic accuracy, algorithmic resource costs, and the noise that exists in realistic quantum hardware. For this, we take the algorithmic resource cost to be the total number of quantum gate-operations in the algorithm; minimizing this cost typically makes the algorithm faster and more energy-efficient. We consider the subtle trade-off between quantum circuit size (small circuits are too imprecise, but large ones are too noisy), and the number of iterations of that quantum circuit for the full algorithm to sufficiently converge. Using a noise-metric-resource methodology, we identify the sweet spot (of circuit size versus iterations) that minimizes the algorithmic resource costs for a desired algorithm accuracy. It also gives the circuit size that maximizes algorithm accuracy for a fixed resource cost. Our methodology provides a practical guideline for near-term deployment of variational algorithms on realistic noisy hardware, including hardware that uses error mitigation.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2606.14409

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

作者:

He Zhang ↗Lingzhu Xiang ↗Haitao Lin ↗Zeyu Huang ↗Minghui Wang ↗Dingyan Zhong ↗Yubo Dong ↗Yihao Wu ↗Yongming Rao ↗Dongsheng Zhang ↗Wanjia He ↗Ling Chen ↗…

arXiv:2606.14409v1 Announce Type: cross Abstract: In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.

阅读与讨论 → 访问原文 →

15.

medRxiv (Medicine) 2026-06-23 DOI: HASH:4f60e7f5d02b47d01e05d3e3179c503c

Unscreenable: The Burden, Structure, and Analytic Consequences of "Unable to Assess" Delirium Documentation in the Intensive Care Unit

作者:

Gorenshtein ↗Adiniaev ↗Omar ↗Barash ↗Klang ↗Daniel ↗

Objective: To quantify the burden, structure, and downstream analytic consequences of "Unable to Assess" (UTA) delirium documentation in the intensive care unit (ICU). Design: Retrospective cross-sectional and repeated-measures study. Setting: A single US academic medical center (Medical Information Mart for Intensive Care IV [MIMIC-IV], 2008-2019). Patients: 72,944 adult ICU stays with at least 1 delirium screen. Interventions: None. Measurements and Main Results: Among 610,632 screens, 130,455 (21.4%; 95% CI, 21.0%-21.8%) were recorded as UTA, exceeding the 119,052 (19.5%) scored positive. The UTA fraction rose from 2.0% at a Richmond Agitation-Sedation Scale (RASS) score of 0 to 97.8% at RASS -4; 22.0% of UTA screens occurred in arousable patients, where UTA was associated with mechanical ventilation (odds ratio [OR], 3.43; 95% CI, 3.17-3.71) and non-English primary language (OR, 3.74; 95% CI, 3.43-4.08). Building the delirium label three ways from the same patients shifted prevalence modestly (32.1% to 30.8%) and prediction (area under the curve, 0.737 to 0.719) but most affected the delirium-mortality association: in a baseline-adjusted model the OR was 4.12 (95% CI, 3.88-4.36) under complete-case handling and fell to 2.16 (95% CI, 2.06-2.27) when UTA was recoded as negative. UTA was recoverable from the observed clinical state (area under the curve, 0.95). Conclusions: In this ICU cohort, Unable to Assess was the most common recorded delirium result other than Negative, exceeding positive screens; recoding it as negative roughly halved the apparent delirium-mortality association by relabeling deeply sedated, high-mortality patients. Delirium datasets should preserve and report UTA, whose concentration among arousable non-English-speaking patients is a measurable equity target.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2606.13912

Direct/adaptive-mixture phase-gradient learning for neural-network quantum states with complex phase structure

作者:

Yi-Ran Xue ↗Rui Wang ↗Baigeng Wang ↗Chenan Wei ↗

arXiv:2606.13912v1 Announce Type: cross Abstract: Neural-network quantum states (NQS) are a leading variational tool for quantum many-body physics, yet their optimization is fragile whenever the ground state carries a non-trivial sign or complex phase structure, a situation generic to gauge fields, broken time-reversal symmetry, and fermionic statistics. We trace this fragility to the stochastic estimator of the phase gradient rather than to network expressiveness. The phase sector of the Monte Carlo energy gradient is a noisy score-function estimator; differentiating the local energy instead yields a direct estimator that is unbiased for the same phase force, has far lower variance, and requires only a separated amplitude–phase ansatz. Demonstrated on a 100-site flux ladder, a small network trained this way reaches $0.89\%$ median error, where tuned standard baselines plateau at $1.8\%$ and wider or deeper standard-gradient networks degrade from $8.4\%$ to $24.6\%$. The advantage carries over to chiral XXX chains: the direct estimator again converges to a markedly lower error than the standard one, across $\alpha$ and size; it grows with flux and vanishes in zero-flux controls. An adaptive-mixture of the two estimators is provably never worse in variance than the better endpoint at the optimal mixing coefficient, with seed-resolved diagnostics tracing much of the gain to eliminating failed runs. Estimator design thus emerges as a first-class lever for complex-valued neural quantum states.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2510.16311

Toward General Digraph Contrastive Learning: A Dual Spatial Perspective

作者:

Zhengyu Wu ↗Daohan Su ↗Yang Zhang ↗Xunkai Li ↗Rong-Hua Li ↗Guoren Wang ↗

arXiv:2510.16311v2 Announce Type: replace Abstract: Graph Contrastive Learning (GCL) has emerged as a powerful tool for extracting consistent representations from graphs, independent of labeled information. However, existing methods predominantly focus on undirected graphs, disregarding the pivotal directional information that is fundamental and indispensable in real-world networks (e.g., social networks and recommendations).In this paper, we introduce S2-DiGCL, a novel framework that emphasizes spatial insights from complex and real domain perspectives for directed graph (digraph) contrastive learning. From the complex-domain perspective, S2-DiGCL introduces personalized perturbations into the magnetic Laplacian to adaptively modulate edge phases and directional semantics. From the real-domain perspective, it employs a path-based subgraph augmentation strategy to capture fine-grained local asymmetries and topological dependencies. By jointly leveraging these two complementary spatial views, S2-DiGCL constructs high-quality positive and negative samples, leading to more general and robust digraph contrastive learning. Extensive experiments on 7 real-world digraph datasets demonstrate the superiority of our approach, achieving SOTA performance with 4.41% improvement in node classification and 4.34% in link prediction under both supervised and unsupervised settings.

阅读与讨论 → 访问原文 →

18.

arXiv (quant-ph) 2026-06-12 DOI: arXiv:2606.13005

Experiment-compatible measurement–feedback quantum state preparation with reinforcement learning

作者:

Xiaotian Nie ↗Tao Zhang ↗Linghui Chen ↗

arXiv:2606.13005v1 Announce Type: new Abstract: Ground-state preparation is a critical task in quantum simulation and quantum computing, as it enables the study of correlated phases and the generation of entangled resource states. While measurement–feedback control has emerged as a promising route to state preparation, existing schemes either rely on handcrafted, task-specific policies or are designed using full quantum-state information that is unavailable in real experiments and becomes impractical for large many-body systems. Here we develop an adaptive measurement–feedback protocol based on reinforcement learning under partial observability. The controller uses only the history of experimentally accessible measurement outcomes to choose both the measurement operator and the feedback action in real time. To make training compatible with experiments, we introduce a stochastic terminal reward built from one-shot measurements of randomly sampled Hamiltonian components, avoiding unphysical full-state reconstruction while remaining an unbiased estimator of the target energy. We demonstrate the method by preparing ground states of the Bose–Hubbard model and by generating GHZ states, establishing a scalable and hardware-compatible route to quantum state preparation.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.19257

DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

作者:

Zirui Wu ↗Lin Zheng ↗Jiacheng Ye ↗Shansan Gong ↗Xueliang Zhao ↗Yansong Feng ↗Wei Bi ↗Lingpeng Kong ↗

Block diffusion language models accelerate decoding through parallel block-wise denoising, yet whether they can be reliably scaled for long chain-of-thought (CoT) reasoning remains unresolved. To this end, we develop DreamReasoner-8B, an open-source block diffusion reasoning model, and conduct a systematic study of how training and inference block sizes affect long-CoT reasoning. Our analysis reveals a stark performance disparity: training with large block sizes yields remarkably poor reasoning, whereas small block sizes preserve effective reasoning. To bridge this granularity gap, we propose block-size curriculum learning, which gradually transitions training from fine-grained to coarse-grained block sizes, thereby overcoming this limitation and enabling strong reasoning performance that generalizes across diverse inference block sizes. On mathematical and code reasoning benchmarks, DreamReasoner-8B achieves results competitive with leading open autoregressive models such as Qwen3-8B. This work establishes a practical foundation for efficient, reasoning-capable diffusion language models. We release our model at https://github.com/DreamLM/DreamReasoner.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.19882

Multimodal Concept Bottleneck Models

作者:

Tongqing Shi ↗Ge Yan ↗Tuomas Oikarinen ↗Tsui-Wei Weng ↗

arXiv:2606.19882v1 Announce Type: cross Abstract: Concept Bottleneck Models (CBMs) enhance the interpretability of deep learning networks by aligning the features extracted from images with natural concepts. However, existing CBMs are constrained in their ability to generalize beyond a fixed set of predefined classes and the risk of non-concept information leakage, where predictive signals outside the intended concepts are inadvertently exploited. In this paper, we propose Multimodal Concept Bottleneck Model (MM-CBM) to address these issues and extend CBMs into CLIP. MM-CBM utilizes dual Concept Bottleneck Layers (CBLs) to align both the image and text embeddings into interpretable features. This allows us to perform new vision tasks like zero-shot classification or image retrieval in an interpretable way. Compared to existing methods, MM-CBM achieves up to 51.26% accuracy improvement on average across four standard benchmarks. Our method maintains high accuracy, staying within ~5% of black-box performance while offering greater interpretability.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.AI) 2026-06-12 DOI: arXiv:2606.13385

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

作者:

Zihao Wang ↗Yiming Li ↗Yutong Wu ↗Zheyu Liu ↗Kangjie Chen ↗Fok Kar Wai ↗Pin-Yu Chen ↗Vrizlynn L. L. Thing ↗Bo Li ↗Dacheng Tao ↗Tianwei Zhang ↗

arXiv:2606.13385v1 Announce Type: cross Abstract: Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing security benchmarks adopt an attack-centric perspective, focusing on the technical feasibility of injections while overlooking the nuanced distribution of resulting harms. In practice, however, prompt-injection risk is victim-dependent: a single exploit can produce asymmetric consequences for different stakeholders, and the same attack pattern may exhibit substantially different effectiveness depending on whom it targets. To capture these properties, we introduce \sysname, a stakeholder-centric benchmark to systematically categorize and attribute harm in real-world web agent systems. It distinguishes between affected entities (e.g., user, seller, platform), decomposes the attacks into concrete objectives, and evaluates each case with complementary outcome- and process-level metrics. Our results reveal substantial and heterogeneous vulnerabilities: not a single attack objective is reliably resisted by current agents, and failures distribute across qualitatively distinct modes ranging from stealthy parasitism (attack succeeds without disrupting the user's delegated task) to misaligned disruption (task disrupted without attack success) and compounded failure (both adversarial objective and task integrity simultaneously violated). These patterns are missed by conventional evaluation, highlighting the need for stakeholder-aware assessment of LLM-based agents in real-world deployments. Benchmark is available at https://github.com/StakeBench/SBC.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.AI) 2026-06-12 DOI: arXiv:2606.12418

Divination by Prompt: LLM-Mediated Xuanxue on Chinese Social Media

作者:

Chuang Li ↗Lixuan Wang ↗Yuqi Chen ↗Ze Hong ↗

arXiv:2606.12418v1 Announce Type: cross Abstract: The rapid proliferation of large language models (LLMs) has produced a striking cultural practice: using conversational AI for divination. This paper offers one of the first systematic studies of LLM-mediated divination in the context of Xuanxue, an internet-native umbrella term for mystical and spiritual practices on Chinese social media. Using a mixed-methods design, we analyze 23000+ posts and comments from Xiaohongshu and conduct 32 semi-structured interviews with users and professional diviners. Users primarily consult LLMs about pragmatic concerns - romantic relationships, careers, exams, and in-game gacha draws - via two intersecting pathways: trend-driven curiosity enabled by viral visibility and zero-cost access, and event-driven anxiety under conditions of uncertainty. A defining feature is collaborative prompt refinement, which turns users into active prompt engineers. Among commenters expressing a clear stance, perceived efficacy skews positive, with "accuracy" often justified through biographical fit and retrospective confirmation, consistent with Barnum and confirmation bias. Users also develop verification practices such as repeated trials and cross-model comparison. Professional diviners, by contrast, portray LLMs as lacking the "spiritual power" required for genuine divination, reflecting both ontological commitments and economic boundary-work. We also show how participants navigate tensions between scientific and metaphysical frames when interpreting AI-generated readings. Situating these findings in anthropological and cognitive-evolutionary theories of divination, we argue that LLM divination preserves core functions of traditional practice while introducing scalability, repeatability, and prompt-driven co-production that reshape how divinatory authority is constructed and evaluated.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2312.06173

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

作者:

Anke Tang ↗Xianglin Luo ↗Li Shen ↗Yong Luo ↗Liang Ding ↗Han Hu ↗Bo Du ↗Dacheng Tao ↗

arXiv:2312.06173v2 Announce Type: replace Abstract: Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion

阅读与讨论 → 访问原文 →

24.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2605.27023

Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling

作者:

Yinan Liu ↗Wenjin Xu ↗Zhiyuan Zha ↗Xiaochun Yang ↗Bin Wang ↗

arXiv:2605.27023v2 Announce Type: replace Abstract: Knowledge graphs (KGs) have become the core backbone of numerous downstream tasks such as question answering and recommender systems. However, despite all this, KGs are often very incomplete. To perform zero-shot knowledge graph completion in unseen KGs, which have different relational vocabularies from those used for pre-training, KG foundation models (KGFMs) receive a wide range of attention. Existing KGFMs often perform training using random negative triples, which are constructed by replacing the head or tail entity of a positive triple with a random entity. However, these negative triples are often constructed with limited quality, providing weak supervision for KGFM training. In this paper, we propose a simple yet effective adaptive negative sampling approach, KMAS, to enhance existing KGFMs. KMAS constructs hard negative triples through the updated relation embeddings generated from the existing KGFM's relation encoder. To further adaptively align with the evolving capability of the KGFM during the training process, KMAS adjusts the ratio of hard negative triples dynamically throughout the whole training process: after a warmup phrase, it increases the ratio linearly and then decreases linearly. Extensive experiments are conducted over 44 data sets. Experimental results demonstrate that our proposed negative sampling method can enhance many SOTA KGFMs without requiring excessive additional time or memory consumption.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2602.02881

Learning-Infused Formal Reasoning: From Contract Synthesis to Artifact Reuse and Formal Semantics

作者:

Arshad Beg ↗Diarmuid O'Donoghue ↗Rosemary Monahan ↗

arXiv:2602.02881v2 Announce Type: replace-cross Abstract: This paper articulates a long-term research vision for formal methods at the intersection with artificial intelligence, outlining multiple conceptual and technical dimensions and reporting on our ongoing work toward realising this vision. It advances a forward-looking perspective on the next generation of formal methods based on the integration of automated contract synthesis, semantic artifact reuse, and refinement-based theory. We argue that future verification systems must builds towards individual correctness proofs toward a cumulative, knowledge-driven paradigm in which specifications, contracts, and proofs are continuously synthesised and transferred across systems. To support this shift, we outline a hybrid framework combining large language models with graph-based representations to enable scalable semantic matching and principled reuse of verification artifacts. Learning-based components provide semantic guidance across heterogeneous notations and abstraction levels, while symbolic matching ensures formal soundness. Grounded in compositional reasoning, this vision points toward verification ecosystems that evolve systematically, leveraging past verification efforts to accelerate future assurance.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络