论文广场 - AcademicHub

01.

arXiv (quant-ph) 2026-06-15 DOI: arXiv:2606.13778

Resurgence of the Thermal Transition between Bounce and Sphaleron

作者:

Shaun D. Hampton ↗Kyungsun Lee ↗Sungjay Lee ↗

arXiv:2606.13778v1 Announce Type: cross Abstract: We study the thermal transition between the bounce and the sphaleron in quantum mechanics with a metastable vacuum from the viewpoint of Borel resurgence. For two models representing a second-order and a first-order transition, we compute the perturbative expansion of the thermal free energy to high orders and extract the leading Borel singularity data $(A,b,S)$ as functions of temperature. The Borel singularity location $A$ reproduces the on-shell action of the dominant saddle on both sides of the transition, joining smoothly in the second-order case and developing a kink in the first-order case. The characteristic exponent $b$ jumps between $0$ and $1/2$ across the transition, counting the zero modes of the corresponding saddle. The Stokes constant $S$ matches the one-loop determinant around the saddle. The perturbative expansion around the false vacuum thus determines the transition temperature, the order of the transition, and the decay rate including the one-loop prefactor without relying on semiclassical inputs.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2510.17431

Agentic Reinforcement Learning for Search Misaligns Instruction-Tuning

作者:

Yushi Yang ↗Shreyansh Padarha ↗Sarah Ball ↗Andrew Lee ↗Adam Mahdi ↗

Agentic reinforcement learning (RL) trains large language models to use tools, but its impact on alignment is poorly understood. We study how agentic RL for search affects the alignment of instruction-tuned (IT) models. We find that RL-trained models inherit refusal reasoning by deflecting harmful requests into benign search queries, but this breaks down under a simple diagnostic trigger that elicits a search call before refusal can occur. Under this condition, RL models produce multi-step unsafe search actions and reasoning, reducing search query safety by up to 68.6% in Qwen and Llama models relative to their IT counterparts. The effect generalises across model families, scales, and RL algorithms. To understand why, we identify linear directions in the residual stream that control search query safety, and show that RL training progressively shifts search behaviour toward the harmful end of this direction. We thus propose representation-guided RL training, which adds a reward penalty based on projection toward the harmful search direction. Training on benign data alone, it restores IT-level alignment without reducing task accuracy and requires no additional training data. Together, our work provides the first framework for diagnosing, mechanistically analysing, and mitigating alignment degradation in agentic RL for search.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.16479

Uncertainty Quality of VGGT: An Analysis on the DTU Benchmark Dataset

作者:

Markus Hillemann ↗Robert Langend\"orfer ↗Steven Landgraf ↗Markus Ulrich ↗

Visual Geometry Grounded Transformer (VGGT) has already attracted a great deal of attention in a short period of time, not least due to the Best Paper Award at CVPR-2025. Similar to DUSt3R and MASt3R, VGGT aims to bring about a paradigm shift by replacing established methods like bundle adjustment and feature matching with a simple, unified, feed-forward neural network that predicts camera poses, depth maps, and dense 3D structure directly from multiple images of a scene in a few seconds. A key aspect is its ability to process an arbitrary number of views consistently in a single forward pass without any post-processing or iterative optimization. For photogrammetry, this opens new possibilities for real-time, scalable, and accessible 3D reconstruction. In this context, not only high reconstruction accuracy but also high-quality uncertainty estimates are crucial, as they foster trust and enable robust quality assurance. This paper therefore investigates the quality of VGGT's uncertainty predictions. The analysis identifies an effective confidence threshold for filtering VGGT's raw output and demonstrates that enhancing uncertainty quality holds strong potential for improving the accuracy of its 3D reconstructions.

阅读与讨论 → 访问原文 →

04.

Nature (Science) 2026-06-16 DOI: HASH:d43cbb6b6dcb6fe9f6e4ac68e316362c

El Niño in a thermally saturated world

作者:

Carlos García-Soto ↗

Letter to the Editor

阅读与讨论 → 访问原文 →

05.

medRxiv (Medicine) 2026-06-12 DOI: HASH:50e957e3933791fa6d822e011c8f0e24

Estimating the effectiveness of syndromic screening at airports for Bundibugyo ebolavirus disease

作者:

Quilty ↗B. J ↗

We used a stochastic simulation model to estimate the effectiveness of combined exit and entry airport screening for Bundibugyo ebolavirus disease (BVD), using natural-history parameters from a Bayesian re-analysis of the 2012 Isiro outbreak. For a 12-hour international flight from DRC or Uganda at 86% screening sensitivity, we estimate 65% of infected travellers would arrive undetected (95% CrI: 38 - 76%). The main driver of this outcome is the relative duration of the the incubation period (approximately 7.7 days) and the onset-to-severe-disease interval (approximately 4 days): most infected travellers board before symptom onset and are undetectable by any syndromic screen, whilst those who are symptomatic progress rapidly to illness severe enough to preclude travel. This is compounded during active epidemic growth, when recently exposed (and therefore pre-symptomatic) cases are overrepresented among travellers. Syndromic airport screening offers limited protection against BVD spread via air travel, and should be complemented by outbreak control at source and strengthened clinical surveillance in receiving countries with high travel connectivity to affected areas.

阅读与讨论 → 访问原文 →

06.

medRxiv (Medicine) 2026-06-10 DOI: HASH:286505634423175cc844d7d030d400cf

Seasonality, source type, and women's water labor: A longitudinal mixed-methods study in Kenya and Honduras

作者:

Mink ↗Ogutu ↗Patrick ↗Sinharoy ↗Bolanos Gamez ↗M. V ↗Macler ↗Ngo ↗C. P ↗Oglesby ↗Bendit ↗White ↗…

Women shoulder the majority of water collection labor globally, yet how their water collection and water-related work experiences may change over time or by water source type remains insufficiently understood. We conducted a longitudinal, mixed-methods study in rural Kenya and Honduras to understand how women's experiences collecting water and performing water-related work varied between (a) two time points, (b) improved and unimproved water source types, and (c) water source location. Data were collected in 2023 and 2024 using interviews, observation, GPS-enabled watches, and scales to measure time and distance traveled, water weight and volume carried, and calories expended. 133 women participated in data collection (66 Kenya, 67 Honduras). We compared women's experience data by time point (2023 vs. 2024), source type (improved vs. unimproved), and source location (off-premises vs. on-premises) (t-test, Mann-Whitney U test). We also mapped participants' routes and activities to show which sources were visited, when, and for what activities. In Kenya, mean water collection time, distance, and caloric expenditure were significantly lower and water volume was significantly higher in 2024 when there were unexpected rains compared to 2023 when there was a persistent drought. When comparing source types during the 2023 drought, journeys to improved sources took significantly less time and energy and covered less distance than journeys to unimproved sources. These differences were not observed during the rainy conditions of 2024 when unimproved sources were closer and more accessible. In Honduras, water collection and water work burdens did not differ significantly by time point or source type. We found women with on-premises water access to still expend considerable time and caloric expenditure engaging in water work within their household compounds. Findings from Kenya suggest that water infrastructure improvements can reduce women's water collection burdens, though benefits may depend on and vary by season and source location. Findings from Honduras show that water labor does not end once water is in the household. Rather, substantial time and energy are expended carrying out water-related work even when sources are on premises, suggesting that efforts to assess water labor need to extend beyond collection alone. To meaningfully reduce burdens and ensure improved water sources are utilized during all seasons, initiatives need to consider source location, seasonal variability, and work beyond collection. Evaluations to assess infrastructure impacts on women's labor and well-being are needed and long overdue.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.18424

A Variational Framework for LLM Generator-Regulator Games

作者:

Quanyan Zhu ↗

arXiv:2606.18424v1 Announce Type: cross Abstract: This paper develops a variational framework for regulated language generation. Starting from autoregressive token sampling, we derive the induced distribution over complete messages and relate it to an entropy-regularized Gibbs law. Regulation is modeled as an optimal discriminator whose convex-dual value is an f-divergence, and the generator-regulator interaction is formulated as a saddle-point problem. The framework applies to moderation, censorship, AI deception detection, compliance auditing, phishing defense, and manipulation control, where regulation concerns a distribution over possible messages rather than a single output. The equilibrium clarifies the tradeoff among utility, entropy, regulatory alignment, and finite-length detectability. Two finite-vocabulary case studies, censorship filtering and phishing defense, illustrate how the theory can be evaluated through utility, entropy, divergence, receiver-side scores, and detection probability.

阅读与讨论 → 访问原文 →

08.

Nature Medicine 2026-06-17 DOI: HASH:4cb0bf126ede4e1ccefc98c2a163fc47

CAR T cells take on precancers

作者:

Eric M. Jurgens ↗

CAR T cell therapy can induce deep responses but also severe toxicities in patients with smoldering myeloma, an asymptomatic cancer precursor; given the potential risks and benefits, its success depends on careful patient selection.

阅读与讨论 → 访问原文 →

09.

PLOS Medicine 2026-06-02 DOI: HASH:c4ca410c3323273572bdd7844aabb7b3

Proteomic signatures of early retinal neurodegeneration in type 2 diabetes mellitus

作者:

Huangdong Li ↗

by Huangdong Li, Ziyu Zhu, Shaopeng Yang, Weijing Cheng, Shaoying Tan, Zhuoyao Xin, Lei Zhang, Zhuoting Zhu, Shida Chen, Wenyong Huang, Wei Wang Background Retinal neurodegeneration is an early and independent feature of diabetic retinal disease and has been proposed as a window into the systemic neural consequences of diabetes, yet accessible molecular biomarkers and individualized prediction tools remain scarce. We aimed to identify circulating plasma protein signatures of diabetic retinal neurodegeneration (DRN) and to translate them into a clinically usable risk prediction system. Methods and findings In this multi-cohort prospective observational study, we integrated high-throughput plasma proteomics with longitudinal optical coherence tomography (OCT) in two independent populations. The discovery cohort comprised 1,492 participants had baseline plasma proteomics and OCT, and 1,218 were followed with repeated OCT over 6 years in Guangzhou Diabetic Eye Study (GDES). DRN was quantified by the annualized OCT-derived retinal nerve fiber layer thinning rate. In multivariable analyses adjusted for age, sex, smoking, systolic blood pressure, HbA1c, and diabetes duration, we identified 71 plasma proteins associated with development and progression of DRN. These proteins mapped onto pathways governing inflammatory immune recruitment, extracellular matrix remodeling, and microvascular homeostasis, providing a plausible biological basis for DRN. We developed a proteomics-based DRN model (Pro-DRN) using eight machine learning (ML) algorithms, including XGBoost and LightGBM. In the independent test set, Pro-DRN achieved a C-index of 0.860, rising to 0.908 when integrated with clinical variables. Compared with six conventional models, Pro-DRN improved discrimination (ΔC-index 0.137 to 0.159; all P

阅读与讨论 → 访问原文 →

10.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.18132

Knowledge Reutilization in Meta-Reinforcement Learning

作者:

Yuan Meng ↗Bo Wang ↗Juan de los Rios Ruiz ↗Xiangtong Yao ↗Zhenshan Bing ↗Fuchun Sun ↗Alois Knoll ↗

arXiv:2606.18132v1 Announce Type: new Abstract: Meta-reinforcement learning enables fast adaptation by extracting shared structure from related tasks, but existing end-to-end methods often couple task inference with embodiment-specific control. This coupling can obscure non-parametric task semantics, reduce sample efficiency, and limit cross-agent reuse. We propose a meta-knowledge reutilization framework that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents. The framework uses a Bayesian non-parametric prior to organize latent task modes and a high-level policy to generate task-level magnitude guidance. To bridge reusable task knowledge with different embodiments, we introduce a semantic-magnitude interface and a lightweight temporal adaptor, which convert frozen meta-knowledge into temporally aligned subgoals for embodiment-specific low-level controllers. Experiments on multiple locomotion agents show that our framework reduces final-step tracking error by 94.75% – 99.79% compared with recent state-of-the-art baselines and achieves comparable deployment performance with about 23.8% of their interaction data.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.LG) 2026-06-18 DOI: arXiv:2602.14789

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

作者:

Rotem Mulayoff ↗Sebastian U. Stich ↗

arXiv:2602.14789v2 Announce Type: replace Abstract: The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16973

How Much Do Reviews Really Contribute? A Study on Text-Enriched Matrix Factorization for Recommendations

作者:

Eduardo Ferreira da Silva ↗Mayki dos Santos Oliveira ↗Joel Machado Pires Denis Dantas Boaventura ↗Frederico Ara\'ujo Dur\~ao ↗

arXiv:2606.16973v1 Announce Type: cross Abstract: Incorporating textual reviews into a Recommender System has become a prominent strategy for enriching collaborative signals with semantic information. However, the actual contribution of review-derived representations remains an open question, particularly when strong collaborative baselines are employed. In this work, we systematically investigate the impact of textual information on Matrix Factorization by introducing and comparing three enrichment strategies over a common collaborative backbone. First, we propose a learnable gating mechanism that adaptively balances collaborative and textual signals during training. This mechanism is applied to two distinct review representations: (i) aggregated topic profiles extracted from user and item histories, and (ii) full text embedding representations derived from reviews. Additionally, we explore a cross-attention mechanism that identifies and emphasizes the most informative dimensions of the textual representation before fusion with collaborative factors. We evaluate six variants: pure, enriched with topic profiles and text via gating; enriched with topics and text via gating; and enhanced with cross-attention over textual features. Experiments across multiple review-based datasets reveal that although adaptive fusion mechanisms improve representation flexibility, the marginal contribution of textual signals remains limited compared to the collaborative backbone. These findings suggest that, under typical rating-prediction settings, collaborative information continues to dominate performance, raising important considerations for the effective integration of semantic review signals into recommendation models.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2505.23878

AC-ODM: Actor–Critic Online Data Mixing for Sample-Efficient LLM Pretraining

作者:

Jing Ma ↗Chenhao Dang ↗Mingjie Liao ↗

arXiv:2505.23878v2 Announce Type: replace-cross Abstract: Optimizing pretraining data composition is pivotal for LLM generalization. While dynamic mixing outperforms static strategies by capturing evolving training dynamics, current methods fail to reconcile computational efficiency with sample efficiency and structural flexibility for diverse pipelines.We introduce Actor–Critic Online Data Mixing (AC-ODM), which approaches data mixing from a reinforcement learning perspective with a parameterized policy that we theoretically prove to act as a dynamic linear surrogate maximizing the constructive interference of gradients. To enhance practical flexibility, AC-ODM supports two operational modes: (i) a proxy mode for fixed, pre-prepared corpora, where a policy learned on a small model is transferred to a larger target; and (ii) a non-proxy mode for direct end-to-end training from scratch without priors. Empirically, AC-ODM significantly outperforms prior methods in convergence speed and downstream accuracy across various architectures. On Pythia-1B, it reaches optimal validation perplexity using up to 66% fewer training steps than competitive baselines, delivering a 27.5% relative improvement in MMLU accuracy and a 2.23 x higher pass@1 on HumanEval, all while incurring a virtually negligible (0.4%) per-step wall-clock increase and only 2% additional memory overhead. Code is available at https://github.com/DANG-ai/AC-ODM.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.15601

SCAN: A Decision-Making Framework for Effective Task Allocation with Generative AI

作者:

Fendi Tsim ↗Alina Gutoreva ↗

arXiv:2606.15601v1 Announce Type: cross Abstract: We introduce SCAN – a human-centric decision-making framework to facilitate learners for effective task allocation with Generative Artificial Intelligence (GenAI) based on Vygotsky's Zone of Proximal Development and Metacognition. In SCAN, we systematize and formalize AI-human interaction by introducing a task-identification approach with four "sub-zones": Substitute, Complement, Aid, and Non-negotiable. After describing the four sub-zones, we demonstrate how SCAN framework can be applied for knowledge workers in the workplace and students in education to metacognitively "scan" their use of Generative AI. We then discuss how such framework can be related to cognitive load theory, cognitive offloading, sycophancy, three decision-making modes in human-AI interactions (automation, augmentation, and collaboration), future of work such as upskilling and deskilling, and how it accounts for both human-human and human-AI learning. We propose that SCAN offers a great starting point before discussing whether GenAI complements or replaces our abilities when completing a task, with a general objective of sustaining lifelong learning, and a specific goal of reaching hybrid intelligence.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.18181

IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction

作者:

Henry Bodwell ↗Hong Yang ↗John C. Simeone ↗Kelvin Gorospe ↗Bella Sullivan ↗Lana Huang ↗Jessica Gephart ↗Sandy Aylesworth ↗Molly Masterton ↗Naren Ramakrishnan ↗

arXiv:2606.18181v1 Announce Type: cross Abstract: Illegal, unreported, and unregulated fishing (IUU) traditionally refers to fishing activities that violate applicable laws or occur in areas that lack applicable laws. We propose the term IUU+ to capture a broader suite of fisheries sector environmental and associated supply chain trade-related crimes and behaviors. Although IUU+ activity is widely recognized as a serious threat to marine ecosystems, markets, and livelihoods, a quantitative understanding of these incidents, e.g., their frequency, geography, species, actors, and patterns in the type of illicit activity, remains difficult to obtain. We propose IUU+DB, a large language model driven system for building a global incident database of IUU+ activity. The system ingests heterogeneous documents, classifies whether they describe relevant incidents, extracts key data elements such as actors, locations, species, vessels, violations, and enforcement outcomes, and supports deduplication and trend analysis. Case studies and validation results show that IUU+DB can help organize fragmented evidence, surface geographic and behavioral hotspots, support fisheries-domain specific research in academia and non-government organizations, assist source and species risk assessments for industry, and provide support for policy implementation and targeted enforcement efforts to government agencies.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16591

SING: Synthetic Intention Graph for Scalable Active Tool Discovery in LLM Agents

作者:

Qiao Xiao ↗Haochen Shi ↗Yisen Gao ↗Wenbin Hu ↗Huihao Jing ↗Tianshi Zheng ↗Baixuan Xu ↗Ziheng Zhang ↗Weiqi Wang ↗Haoran Li ↗Jiaxin Bai ↗Yangqiu Song ↗…

Large language model (LLM) agents increasingly rely on agent harnesses that manage context, tools, and multi-turn execution, making tools a central interface for acting in realistic digital environments. As harness-connected tool ecosystems expand to hundreds or thousands of APIs, services, and task-specific skills, exhaustive tool schema injection becomes costly and imposes a closed-world assumption that limits agents to a predefined static inventory. Retrieval-augmented tool selection offers a natural alternative, but existing one-shot retrieval methods often fail to align isolated tool descriptions with the agent's true task intention, especially in long-horizon tasks where required capabilities emerge through decomposition, observations, and newly induced subgoals. We propose SING, an intention-aware active tool discovery framework that builds an intention-tool graph linking user intentions, tool capabilities, and tool collaboration patterns, and dynamically retrieves tools according to evolving task states. Using a unified corpus of 7,471 tools, we evaluate SING on three real-world tool-use benchmarks. SING improves Global Recall@5 by up to 59.8% and downstream success rate by up to 28.9% over baselines, while reducing full-corpus tool-schema exposure by 99.8%, demonstrating that intention-aware graph structure enables more accurate and context-efficient tool discovery in large-scale agentic ecosystems.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2606.14619

StereoGeo: an end-to-end stereo camera calibration method

作者:

Imane Meddour ↗Andr\'ea Macario Barros ↗C\'edric Gouy-Pailler ↗

In this work, we propose StereoGeo, an end-to-end network-based approach for stereo camera calibration. Our method estimates the focal lengths and gravity directions of the left and right cameras, as well as the relative extrinsic transformation relating them. Existing methods often rely on calibration patterns in structured environments or address only a single camera configuration, being limited to either intrinsic or extrinsic estimation, and depending on a multi-view setups. StereoGeo extends the GeoCalib algorithm, integrating deep neural network feature extraction with a differentiable optimizer. Extensive experiments on real-world benchmarks demonstrate that StereoGeo achieves competitive performance for intrinsic calibration and provides accurate stereo extrinsic estimation, outperforming existing methods that are limited to monocular settings. The dataset used in this work is partially publicly available at https://github.com/meddourimane/StereoGeo-dataset.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2606.17645

Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns

作者:

Shiqi He ↗Yue Cui ↗Feijie Wu ↗Xinyu Ma ↗Jiaheng Lu ↗Yaliang Li ↗Bolin Ding ↗Mosharaf Chowdhury ↗

Large language model (LLM) web agents are usually deployed as tool callers: each turn, the model reads a fresh page observation and emits one structured tool action. When every action is a low-level primitive, horizons grow quickly and so do policy-facing LLM completions, dominating latency and cost on benchmarks such as Mind2Web and WebArena. Recent systems therefore wrap repeated interaction fragments as web skills: callable tools built from successful trajectories or induced programs, so one call can replace several primitives. However, prior skill libraries are still triggered mainly by instruction similarity or coarse site metadata, which yields low skill reuse on held-out sites and leaves much of the potential step and token reduction on the table. We present SkillMigrator, an agent that learns reusable web skills and transfers them across sites by matching layout structure rather than specific element references. Each induced skill is stored as a transferable interaction pattern (TIP): the skill paired with a structural sketch of the snapshot at induction time. At test time, SkillMigrator retrieves TIPs by layout similarity and grounds their references on the live page. The rest of the stack is standard: accessibility-snapshot observations with stable references, and fixed tool calling over primitives plus skill invocations. Compared with the state-of-the-art approaches, SkillMigrator reduces the average LLM-action count on successful trajectories by 8-10% across both WebArena and Mind2Web at matched success rate.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.AI) 2026-06-12 DOI: arXiv:2606.13629

Valid Inference with Synthetic Data via Task Exchangeability

作者:

Lezhi Tan ↗Tijana Zrnic ↗

arXiv:2606.13629v1 Announce Type: cross Abstract: There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations increasingly rely on "LLM-as-a-judge" outputs; and proteomics research is accelerated by generative models that produce synthetic protein structures. These developments raise an intriguing possibility: synthetic data may help researchers ask more questions, run more studies, and accelerate discovery. But they also raise a fundamental concern: synthetic data can be biased, noisy, and misspecified. In this work, we propose statistical principles for using synthetic data in scientific research with provable validity guarantees. The key insight is a new technical condition that we call task exchangeability. Informally, this is a requirement that the researcher can identify historical tasks, for which real data is available, such that their current task of interest is exchangeable with the historical tasks in an appropriate mathematical sense. We develop methods for valid inference under task exchangeability, together with extensions that provide guarantees even beyond exchangeability. We demonstrate the framework on public opinion surveys with silicon samples and AI evaluation with autoraters.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2606.19849

ViCoStream: Streaming VideoLLMs Can Run Beyond 100 FPS with Stage-Wise Coordinated Inference

作者:

Yang Tan ↗Junlong Tong ↗Linan Yue ↗Hao Wu ↗Pengfei Fang ↗Xiaoyu Shen ↗

Streaming VideoLLMs must continuously process incoming video while maintaining low query latency, making both video-ingestion throughput and query-time responsiveness critical for real-time deployment. Existing methods largely focus on accelerating individual modules, such as visual encoding, token pruning, or KV-cache compression, but provide limited insight into whether the resulting system can sustain real-time streaming performance. We formulate streaming VideoLLM inference as a coordinated pipeline spanning visual preprocessing, visual encoding, token dropping, and LLM prefilling/decoding. Building on this formulation, we propose ViCoStream (Video Coordinated Streaming), a stage-wise coordinated streaming framework that combines chunk-wise execution, CUDA-stream overlap, visual token control, bounded visual attention, and query-side retrieval to bound per-chunk computation and memory costs. We further provide a systematic study of bottleneck migration, revealing how chunk size, token retention, attention locality, and retrieval scope shape the throughput-accuracy trade-off. Experiments with Qwen2.5-VL-3B/7B-Instruct across multiple streaming benchmarks show that ViCoStream achieves 134 FPS video throughput and less than 50 ms TTFT on a single A100 GPU while maintaining accuracy close to full-history baselines.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2509.18085

Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

作者:

Sudhanshu Agrawal ↗Risheek Garrepalli ↗Raghavv Goel ↗Christopher Lott ↗Fatih Porikli ↗Mingu Lee ↗

Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token-generation rates. To unlock this potential, we present Spiffy, a speculative decoding algorithm to accelerate dLLM inference while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding of AR-LLMs to dLLMs. Spiffy performs auto-speculation to eliminate the overheads of an independent draft model, structuring draft states in the form of a novel directed draft graph to take advantage of the bidirectional, blockwise nature of dLLM generation. These draft graphs are calibrated offline to maximize acceptance rates and are dynamically pruned during inference for improved computational efficiency. We present a detailed formulation of Spiffy and demonstrate its ability to accelerate LLaDA, Dream, and SDAR models in combination with KV caching and threshold-based dynamic unmasking leading to up to $8.6\times$ reduction in model inferences and $6.3\times$ acceleration in token rate.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.20023

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

作者:

Kaiyue Yang ↗Yuyan Bu ↗Jingwei Yi ↗Yuchi Wang ↗Biyu Zhou ↗Juntao Dai ↗Songlin Hu ↗Yaodong Yang ↗

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.

阅读与讨论 → 访问原文 →

23.

medRxiv (Medicine) 2026-06-10 DOI: HASH:b4f09633d02da1de5a3f984adb611950

Towards the Virtual Amyotrophic Lateral Sclerosis Patient: Inferring Cortical Excitability through Whole-Brain Dynamical Modeling

作者:

Angiolelli ↗Demuru ↗Lopez ↗E. T ↗Hashemi ↗Ziaeemeh ↗Rabuffo ↗Trojsi ↗Granata ↗Tafuri ↗De Luca ↗Gallo ↗…

Amyotrophic lateral sclerosis (ALS) is increasingly recognized as a multisystem neurodegenerative disorder in which motor-neuron degeneration is accompanied by widespread alterations in cortical dynamics. Among its most reproducible neurophysiological signatures is cortical hyperexcitability, yet how this local excitability imbalance shapes distributed whole-brain activity remains poorly understood. Here, we combined source-reconstructed resting-state MEG data, tractography-informed whole-brain modeling, and simulation-based inference to investigate whether ALS-related alterations in large-scale brain dynamics can be mechanistically explained by changes in cortical excitability. First, we characterized empirical brain dynamics using complementary features spanning regional activity amplitude and variability, functional connectivity, and avalanche-based metrics. These analyses revealed significant alterations in ALS patients relative to healthy controls, as well as associations with clinical impairment and disease staging. To mechanistically interpret these changes, we employed a reduced Wong-Wang whole-brain model in which local recurrent excitation modulates emergent large-scale neural dynamics. Simulations showed that increasing excitability systematically reproduced the empirical dynamical signatures observed in ALS. We then applied a simulation-based inference framework to estimate latent excitability parameters directly from empirical observations. Whole-brain model inversion revealed increased excitability in ALS patients compared with controls. The recovered excitability parameter was associated with disease staging, supporting its clinical relevance as a model-derived descriptor of ALS progression. Finally, by extending the model to estimate frontal and non-frontal excitability separately, we found that ALS-related alterations were predominantly associated with increased frontal excitability, whereas non-frontal regions appeared comparatively less affected. The recovered parameters related to disease staging. Together, these findings provide a mechanistic framework linking altered large-scale brain dynamics in ALS to selective cortical hyperexcitability, explaining how local excitability changes can give rise to global network reorganization. More broadly, they show how computational model inversion can recover latent multiscale pathophysiological processes from empirical neural recordings, offering a non-perturbative alternative to complex experimental paradigms typically required to causally probe local-to-global mechanisms.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2604.23628

Characterizing Admissible Objective Functions for Hierarchical Clustering

作者:

Ryuki Tsukuba ↗Kazutoshi Ando ↗

arXiv:2604.23628v2 Announce Type: replace-cross Abstract: Hierarchical clustering is a fundamental task in data analysis, but classical methods have long lacked a principled objective function. Dasgupta [STOC~2016] took an important step toward addressing this gap by proposing a well-motivated objective function for cluster trees. Cohen-Addad et al. [J. ACM 2019] subsequently introduced the notion of admissibility: an objective function is admissible if, whenever the input similarity matrix admits generating trees, its minimizers are precisely those generating trees.They also gave a necessary and sufficient condition for admissibility within a family of objective functions based on aggregate intercluster similarity. We refer to this family as sum-type objective functions. However, apart from Dasgupta's original objective function, no explicit admissible objective functions in this family were provided. In this paper, we study admissible objective functions for hierarchical clustering in two directions. For sum-type objective functions, we give a complete characterization when the scaling function is a symmetric polynomial of degree at most two, and we derive sufficient conditions for degree-three polynomials. We also show that the recursive sparsest cut algorithm achieves an O$(\phi)$-approximation ratio for the admissible objective functions covered by our characterization, where $\phi$ is the approximation factor of the sparsest cut subroutine. We then introduce max-type objective functions, where cluster interaction is measured by maximum, rather than aggregate, intercluster similarity. For this class, we characterize which objective functions are admissible for arbitrary symmetric scaling functions and give a complete characterization when the scaling function is a symmetric polynomial of degree at most two.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2606.19817

Training-Free Metrics for Synthetic Object Detection Data: A Proxy for Detector Performance

作者:

Myeongseok Nam ↗Donghoon Yeo ↗Seungwook Kim ↗

With the recent advent of image generative models, synthetic data are increasingly being used to supplement limited real datasets for training computer vision models. However, not all synthetic datasets improve performance equally, and their effectiveness can only be assessed by training a downstream model, which is computationally expensive and time-consuming. This problem is pronounced in the task of object detection, where the required annotations are much more dense due to bounding boxes. In this paper, we propose a pre-computable metric family, dubbed Conditional-Composition Domain Match (CCDM), which serves as a proxy for the relative utility of candidate synthetic training sets for downstream detection. Experiments on the VisDrone-DET dataset show that the CCDM metric families achieve a Spearman correlation of 1.0 with the downstream performance of YOLOv8, clearly outperforming existing metrics for synthetic image evaluation.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络