Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-17

Combating Data Laundering in LLM Training

arXiv:2604.01904v3 Announce Type: replace-cross Abstract: Post-hoc unauthorized-training data detection for large language models (LLMs) typically assumes a query-with-originals regime: rights holders query a target LLM with raw proprietary data and assess whether the model assigns them stronger memorization-based detection signals, e.g., higher confidence or lower loss, than held-out non-training reference texts. We show that this regime becomes brittle under data laundering, where the target LLM is trained on semantics-preserving but stylistically or structurally transformed surrogates of proprietary data to obfuscate provenance. Since training-time exposure occurs in the laundered form, memorization signals may no longer appear on the originals, collapsing the candidate-reference signal separation that standard detectors rely on. We counter this threat by studying laundering-aware detection with raw proprietary data, a held-out reference corpus, and query access to the target LLM, while the laundering transformation is undisclosed. Since exact recovery of the laundered corpus is infeasible, we infer a detection-useful synthesis process via an auxiliary LLM that maps originals into training-like queries. To make this search tractable, we introduce Synthesis Data Reversion (SDR), which constrains the unbounded space of natural-language transformations through a goal-details abstraction: a high-level transformation goal, e.g., "lyrical rewriting", and fine-grained details, e.g., "with vivid imagery". SDR identifies the most likely goal and iteratively refines details so synthesized queries elicit stronger target-model detection signals. Evaluated on the MIMIR benchmark against diverse laundering practices and target LLM families (Pythia, Llama2, and Falcon), SDR consistently restores detection signals, offering a practical auditing layer against data laundering.

02.
arXiv (CS.CV) 2026-06-16

MeshLoom: Feed-Forward Non-Rigid Registration of Mesh Sequences

We present MeshLoom, a feed-forward registration network that directly reconstructs vertex deformations across mesh sequences. Our approach advances non-rigid registration beyond existing models, which are typically constrained by costly per-instance optimization, narrow object categories, pairwise-only inputs, or merely intermediate outputs. The network is simple and efficient, registering multiple meshes within seconds. At its core lies a topology-aware encoder–decoder design. Specifically, we first introduce a topology-aware point representation that encodes the anchor (reference) mesh's topology into its per-vertex features. This representation strengthens the network's understanding of the anchor-mesh geometry and disambiguates points that are Euclidean-close yet geodesically distant. We then propose a multi-modal encoder that fuses this anchor-mesh representation with complementary cues from each frame, such as shape latents and image features. These multi-source signals are compressed into a compact global motion embedding that captures dense inter-frame correspondence. A lightweight decoder then queries this global embedding with the anchor-mesh point representation, retrieving per-vertex deformations at target timestamps. Through extensive experiments across diverse motions and object categories, we show that MeshLoom achieves state-of-the-art results on non-rigid registration. In addition, we find that our global embedding-then-query paradigm naturally enables the network to generate deformations at intermediate timestamps, which extends MeshLoom to motion interpolation and mesh morphing. Project page: https://meshloom.github.io/ .

03.
arXiv (CS.AI) 2026-06-17

A Neuromorphic Trigger for Efficient Audio Event Detection

arXiv:2606.17775v1 Announce Type: cross Abstract: Efficient processing of continuous audio streams remains a key challenge for real-time and resource-constrained systems. This paper introduces a neuromorphic trigger for audio event detection, based on a spiking neural network (SNN) that selectively gates input to downstream models. The proposed trigger acts as a low-cost front-end, identifying salient audio segments and forwarding only these to a more computationally intensive model for tasks such as classification. The trigger is implemented as a lightweight fully connected SNN and evaluated on two representative tasks: Anomalous Sound Detection (ASD) and Sound Event Detection (SED). For ASD, the trigger achieves a one-second segment-based F1 score of 0.97 on a class-agnostic form of the URBAN-SED dataset, demonstrating high reliability in identifying relevant audio regions. For SED, the trigger is combined with the Dang classifier on the DCASE 2017 Challenge Task 2 dataset, showing a potential $42.6\times$ reduction in FLOPs while reducing the lower bound of the event-based error rate from 0.41 to 0.25. These results highlight the potential of neuromorphic triggers as real-time, energy-efficient front-end filters, enabling substantial reductions in computational cost.

04.
arXiv (quant-ph) 2026-06-16

Certified Finite-Shot Operating Windows for Virtual Distillation and Symmetry Verification

arXiv:2606.15464v1 Announce Type: new Abstract: Quantum error mitigation methods are usually compared through their infinite-shot bias, but on real devices the comparison is decided by finite sampling budgets, estimator instabilities, and per-shot resource costs. We develop a finite-shot operating-window theory that makes this comparison certifiable for virtual distillation (VD) and symmetry verification (SV): for each method we derive a mean-squared-error law with explicit, non-asymptotic remainder constants. For VD, the law captures the statistical bias and denominator instability of its quotient estimator, with a concentration certificate locating the sample size beyond which the quotient is trustworthy; for SV, it isolates the bias floor left by undetectable errors and the sampling penalty set by the acceptance probability. A selection trichotomy classifies any two-method comparison into a tie, uniform dominance, or a genuine tradeoff with a certified crossing window, including a self-consistency test that rejects spurious crossings. The theory makes falsifiable predictions – operating-window locations scaling as $p^{-2}$ or $p^{-1}$ in the noise rate, and the sign pattern of all pairwise comparisons – which exact white-box experiments confirm with fitted exponent $-1.97$ against the predicted $-2$ and with $300/300$ sign agreement, within a pre-registered analysis whose single failed gate, an over-strict all-instance criterion, is reported and audited in full. Gate-level simulation and archived runs on two IBM backends then test the windows under device conditions: idealized VD windows exist, but realistic interferometry overhead and denominator instability erase them, and calibrated SV is the practical winner in the tested QAOA instances. This absence of a universal winner is not a failure of mitigation; it is the regime structure that certified operating windows predict.

05.
arXiv (CS.CV) 2026-06-16

RefGC-SR$^2$: Reference-guided Generated Content Super-Resolution and Refinement

Reference-guided generation (e.g., object compositing, customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centric high-resolution reference image (HRRI) provided by users is downsampled to a fixed low-resolution (LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated content super-resolution-refinement (RefGC-SR$^2$), where the original HRRI is reused at the post-processing stage to recover lost details, refine generative artifacts, and upscale the output simultaneously. We construct the first real-world triplet data generation pipeline for this RefGC-SR$^2$ task, training a diptych-conditioned generator to synthesize paired low-quality anchors that public pretrained models cannot provide. We further present a frequency-aware diffusion transformer model for RefGC-SR$^2$ that selectively injects fine details from the HRRI while removing generative artifacts. Extensive experiments demonstrate that our RefGC-SR$^2$ model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.

06.
arXiv (CS.LG) 2026-06-19

Diffuse AI Control on Fuzzy Tasks

arXiv:2606.08892v2 Announce Type: replace Abstract: AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition. To understand diffuse threats on fuzzy tasks, we introduce a framework that considers AI control as an adversarial game between a blue team and a red team. The blue team uses a weak trusted model to construct a weak score against which they would train a strong, potentially subversive model to remove the subversion propensity if it were present. The red team then tries to find model behaviors that are rated highly by the weak score, and thus might not be trained out, but actually correspond to poor performance. We test our framework on the task of writing experimental proposals for research questions from recent ML papers. We use a language model with access to the original paper as a proxy "ground-truth" scorer. Our red team discovers subversive behaviors using multi-objective evolutionary prompt optimization. We show that Opus~4.6 can write proposals that are worse according to the ground truth proxy than those of GPT-OSS-20B, while the weak scorer rates them as highly as the best proposals from Opus 4.6. We then propose an adversarial optimization algorithm for the blue team that discovers more robust prompts for the weak model. This algorithm produces a blue team prompt that our red team optimization fails to exploit.

07.
arXiv (CS.LG) 2026-06-16

ML Inference Scheduling with Predictable Latency

arXiv:2512.18725v3 Announce Type: replace Abstract: Machine learning (ML) inference serving systems can schedule requests to improve GPU utilization and to meet service level objectives (SLOs) or deadlines. However, improving GPU utilization may compromise latency-sensitive scheduling, as concurrent tasks contend for GPU resources and thereby introduce interference. Given that interference effects introduce unpredictability in scheduling, neglecting them may compromise SLO or deadline satisfaction. Nevertheless, existing interference prediction approaches remain limited in several respects, which may restrict their usefulness for scheduling. First, they are often coarse-grained, which ignores runtime co-location dynamics and thus restricts their accuracy in interference prediction. Second, they tend to use a static prediction model, which may not effectively cope with different workload characteristics. In this paper, we evaluate the potential limitations of existing interference prediction approaches, finding that coarse-grained methods can lead to noticeable deviations in prediction accuracy and that static models degrade considerably under changing workloads.

08.
arXiv (CS.AI) 2026-06-16

CrossMaps: Confidence-Aware Open-Vocabulary Semantic Mapping for Rover Navigation

arXiv:2606.16935v1 Announce Type: cross Abstract: Rovers rely on perception to maintain spatial maps that encode both objects and sensor quality (e.g., range reliability, lighting artifacts, data density), guiding data fusion, embedding updates, and navigation under partial observability. To study these coupled perception-navigation processes, we present CrossMaps, a real-time confidence-aware open-vocabulary semantic mapping pipeline that constructs language-queryable maps from RGB-D data. Building on VLMaps-style approaches, CrossMaps integrates multi-scale CLIP embeddings with confidence-aware fusion and a dual-memory architecture consisting of Short-Term Memory (STM) and Long-Term Memory (LTM). The STM aggregates noisy visual observations using geometric, semantic, and temporal confidence cues, while confident and coherent cells are promoted to the LTM as persistent semantic landmarks. Designed for deployment with a Jetson Orin-powered UGV alongside SLAM, CrossMaps runs in real time and produces semantic heatmaps that can be queried with natural language to guide rover navigation.

09.
arXiv (CS.CL) 2026-06-16

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code and implementation details are released at https://github.com/XMUDeepLIT/SAAS.

10.
arXiv (CS.AI) 2026-06-11

Noise-Aware Framework for Correcting Corrupted Labels

arXiv:2606.11695v1 Announce Type: cross Abstract: High-quality labeled data is essential for training reliable ML/DL models. However, real-world datasets often contain a considerable proportion of corrupted labels, which can severely degrade model performance. To address this problem, we propose CANOLA, a novel framework for correcting corrupted labels through noise-aware learning and iterative label refinement. CANOLA explicitly estimates the underlying noise distribution of the dataset and incorporates this information into the training of a noise-aware Deep Neural Network. By incorporating noise characteristics during learning, CANOLA enables the model to down-weight unreliable supervision signals and focus on trustworthy patterns, thereby improving robustness and generalization. Label correction is performed via cautious, iterative soft label refinement, in which model predictions are blended with observed labels to prevent premature or erroneous updates. This progressive refinement allows the dataset to be repaired in a stable and controlled manner. We evaluate CANOLA on six widely used datasets under realistic noisy labeling scenarios. Experimental results show that CANOLA consistently outperforms SOTA label correction methods, achieving relative improvements ranging from 19% to 52% in error reduction. Moreover, models trained on datasets corrected by CANOLA obtain substantial downstream performance gains. Even simple classifiers trained on CANOLA's corrected data can outperform complex model-centric approaches by margins of up to 67%.

11.
arXiv (CS.CL) 2026-06-16

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose URL and domain cues that allow models to rely on reputation shortcuts. To address this gap, we introduce FraudSMSWalker, a controlled benchmark for URL-masked SMS-to-webpage fraud judgment. FraudSMSWalker contains 699 bilingual chains, including 332 fraudulent and 367 benign cases, across ten service scenarios. The model-visible input consists of the SMS context and sanitized webpage evidence, while raw URLs, hosts, domains, IPs, redirects, and reputation metadata are withheld. The benchmark further includes hard benign cases whose pages contain login, payment, verification, or account-management elements that are plausible under the service context but also appear in scam flows. We evaluate nine web agents under masked browser-agent protocols and conduct URL-visibility ablations. The results show that current agents can detect suspicious cues, but struggle to preserve benign recall and often produce positive predictions that are weakly supported by the observed evidence. These findings position FraudSMSWalker as a benchmark for measuring whether web agents can make fraud judgments that remain both accurate and evidence-grounded when direct reputation shortcuts are suppressed. The associated code and dataset are accessible at the \href{https://anonymous.4open.science/w/FraudMessageWalker-Bench}{anonymous link}.

12.
arXiv (CS.CL) 2026-06-12

NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems

Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopted to improve grounding, confidence calibration in RAG settings remains poorly understood. We conduct a systematic study across four benchmarks, revealing that LLMs exhibit poor calibration performance especially when noisy contexts are retrieved. Specifically, contradictory or irrelevant evidence tends to exacerbate the model's overconfidence issue. To address this, we propose NOVA Rules (NOise-Aware Verbal Confidence CAlibration Rules) to provide a principled foundation for resolving overconfidence under noise. We further design NOVA, a noise-aware calibration framework that synthesizes supervision from ~2K HotpotQA examples guided by these rules. By performing supervised fine-tuning (SFT) with this data, NOVA equips models with intrinsic noise awareness without relying on stronger teacher models. Empirical results show that NOVA yields substantial gains, improving ECE scores by 10.9% in-domain and 8.0% out-of-domain. By bridging the gap between retrieval noise and verbal calibration, NOVA paves the way for both accurate and epistemically reliable LLMs.

13.
arXiv (math.PR) 2026-06-16

A Machine-Checked Itô Calculus for Brownian Motion

arXiv:2606.15089v1 Announce Type: cross Abstract: We present a machine-checked development of the $L^2$ Itô calculus of Brownian motion on a bounded time interval $[0,T]$, formalized in Lean 4 on top of Mathlib and the BrownianMotion package. The development contains: the construction of the Itô integral as an isometry of Hilbert spaces, from a predictable-rectangle $\pi$-system through the density of simple adapted processes; the Itô integral as a process, proved to be an $L^2$-continuous martingale through a single structural identity (the integral at time $t$ is the conditional-expectation projection of its terminal value onto $\mathcal{F}t$), from which adaptedness, the martingale property, the contraction bound, and both the terminal and the time-indexed Itô isometries follow as corollaries; and Itô's formula for $C^3$ functions with bounded derivatives, including its time-dependent form $df = f_x,dB + (f_t + \tfrac12 f{xx}),dt$, obtained by a discrete-to-continuous argument through weighted quadratic variation and explicit $L^2$ remainder bounds. To our knowledge this includes the first machine-checked proof of Itô's formula, and the first machine-checked construction of the Itô integral as a martingale-valued process, in any proof assistant. We are deliberate about the boundary: the theory is the $L^2$ theory on $[0,T]$ with bounded-derivative integrand classes; localization to the unrestricted $C^2$ formula, integrators beyond Brownian motion, and pathwise statements are out of scope, and we say precisely why and where. The development is roughly 7,200 lines of Lean across 22 modules; every theorem is sorry-free, the axioms of each headline result are pinned to Mathlib's classical defaults by a build-enforced gate, and the whole is reproducible from a pinned toolchain.

14.
arXiv (quant-ph) 2026-06-12

Spectral analysis of equilibration: information leakage in isolated quantum systems

arXiv:2606.12545v1 Announce Type: new Abstract: We develop a unified dynamical-spectral framework for equilibration in isolated quantum systems based on a subspace coarse-graining approach. Central to our formulation is the Leakage Fidelity Function (LFF), defined as the probability that a unitarily evolving state escapes the support of its initial subspace. This quantity provides a direct, operational measure of information flow and memory loss without invoking ensemble assumptions or perturbative arguments. We derive universal bounds on temporal fluctuations of the LFF, in terms of the spectral gap structure and the square of the effective dimension, evincing that large spectral delocalization suppresses fluctuations and guarantees equilibration on average. By introducing spectral power distributions and associated entropic measures, we establish a quantitative link between phase mixing, gap participation, and dynamical stability. We further investigate the equilibration timescale by connecting the LFF to quantum speed limits, thereby revealing the average time required for equilibration. Our results provide a state-dependent, geometrically transparent perspective on how spectral complexity and subspace information leakage jointly govern irreversibility in closed quantum many-body systems.

15.
Nature (Science) 2026-06-10

In situ nanocrystal confinement for efficient blue perovskite LEDs

Metal halide perovskites have emerged as promising semiconductors for light-emitting diodes (LEDs) owing to their excellent luminescence properties1. However, their performance remains limited, primarily owing to the inherent contradiction between ‘high crystallinity’ and ‘small size’ in the in situ synthesis of perovskite nanocrystals on substrates. Here we report efficient blue perovskite LEDs (PeLEDs) achieved via in situ polymerization-driven nanocrystal confinement to synthesize perovskite films composed of high-quality nanocrystals. The in situ-formed polymer network imposes nanoscale spatial constraints during perovskite nanocrystal growth, enabling nanocrystals with small sizes and a high photoluminescence quantum yield of 83%. Furthermore, polymerizable monomers with sufficient coordination sites allow a prolonged lattice rearrangement of perovskite clusters, promoting the crystallinity of the nanocrystals. The synthesized perovskite nanocrystals are utilized in the fabrication of PeLEDs, resulting in an external quantum efficiency of 21.8% at 491 nm, which is among the highest performances in blue PeLEDs. This work simultaneously controls the thermal dynamics of perovskite crystallization and organic ligand reactions, which helps to advance understanding of the effect of ligand engineering on nanocrystal synthesis, benefiting the development of efficient PeLEDs and other optoelectronic technologies. Efficient blue perovskite light-emitting diodes with an external quantum efficiency of 21.8% are achieved through in situ polymerization-driven nanocrystal confinement.

16.
arXiv (CS.LG) 2026-06-15

Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

arXiv:2606.14095v1 Announce Type: new Abstract: We study the sample complexity of learning in average-reward weakly-coupled Markov decision processes (WCMDPs) and Restless Bandits (RBs) under a generative model. Naive reduction to a tabular MDP leads to high complexity bounds as the state-action space is exponentially large in the number of arms $N$. By exploiting the weakly coupled structure, we show that near-optimal policies can be learned with sample and computational complexities that are polynomial in $N$. Specifically, we analyze the plug-in approach, which applies an efficient planning algorithm to an empirical model estimated from data. For fully heterogeneous WCMDPs, we establish the first finite-sample PAC guarantee with polynomial complexity and an $O(1/\sqrt{N})$ optimality gap. For homogeneous RBs, we further prove that a smaller optimality gap is achievable under mild structural assumptions. A primary technical contribution of our work is a novel Lyapunov-based analysis framework. Unlike classical approaches that rely on the difficult-to-control bias function, our framework uses an explicitly constructed Lyapunov function along with a drift transfer technique between the true and empirical models. A key step of independent interest in our framework is a fine-grained perturbation analysis for the underlying linear programming (LP) relaxation, which provides a general tool for analyzing LP-based policies and weakly-coupled systems.

17.
arXiv (quant-ph) 2026-06-11

Power-law-graded Ising Interactions Stabilize Time Crystals Realizing Quantum Energy Storage and Sensing

arXiv:2508.14847v3 Announce Type: replace Abstract: We study discrete time-crystalline (DTC) phases in one-dimensional spin-1/2 chains with power-law-graded Ising interactions under periodic Floquet driving. By generalizing Stark localization to power-law-graded Ising interaction profiles, we identify robust period-doubled dynamics across a wide range of interaction exponents, stabilized by the interplay between coherent driving and spatially varying coupling. Within the DTC phase, the energy stored in the system, interpreted as a quantum battery, increases superlinearly with system size, although no scaling advantage persists in normalized power. Beyond energy storage, we demonstrate that the DTC phase supports enhanced quantum sensing. The quantum Fisher information associated with estimating timing deviations in the drive scales superextensively with system size, surpassing the Heisenberg limit. The degree of quantum advantage can be tuned by varying the interaction exponent, though DTC behavior remains robust throughout. Our results position power-law-graded Ising interacting Floquet systems as robust platforms for storing quantum energy and achieving metrological enhancement.

19.
arXiv (CS.LG) 2026-06-16

Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains

arXiv:2606.15155v1 Announce Type: new Abstract: Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs, symptoms, and patient records, KGs provide a semantic backbone for decision-making, prediction, recommendation, and personalized care. Recent advances have demonstrated their utility across diverse medical applications–including clinical decision support systems, disease and treatment outcome prediction, health recommender systems, precision medicine, and medical question answering–where KGs often enhance interpretability, semantic coherence, and patient-specific reasoning. In parallel, a growing body of work focuses on medical KG generation itself, proposing frameworks that construct graphs from EHRs, clinical narratives, biomedical literature, and web resources using ontologies, semantic web technologies, deep-learning-based information extraction, and hybrid neuro-symbolic pipelines. Despite this progress, significant challenges remain, including limited and fragmented knowledge coverage, difficulties in aligning heterogeneous data sources, the fragility of current reasoning and representation-learning methods on dense multi-relational graphs, and unresolved issues related to privacy, bias, and accountability. This survey reviews and categorizes current research on KGs in medicine along both application-oriented and methodology-oriented dimensions, discusses their benefits and technical foundations, and outlines key limitations and open research directions. By analyzing trends, architectures, and evaluation practices, this work aims to guide future developments in KG-driven medical AI systems and support their safe and effective integration into healthcare environments.

20.
arXiv (CS.CL) 2026-06-16

Scaling Human and G2P Supervision for Robust Phonetic Transcription

Expert phonetic annotation is costly, especially for non-standard dialects and atypical speech. A common alternative is using Grapheme-to-Phoneme (G2P) models to auto-generate phonetic labels from text transcripts at scale. We study how automatic phonetic transcription performance scales with human and G2P supervision in English. Using a curated 80-hour benchmark spanning native, non-native and post-stroke speech, we identify a supervision quality threshold: G2P supervision helps only when fewer than 20-30 hours of human annotation are available. Beyond this threshold, it provides no significant benefit and can reduce cross-dialect robustness. What is effective after this threshold is ASR pretraining which we use to achieve a 2.3x reduction in weighted phone feature error rate over prior systems, with strong gains on non-native and aphasic speech. These results suggest that quantity-driven G2P scaling may yield diminishing returns for robust generalization.

21.
arXiv (CS.CV) 2026-06-12

AudioX-Turbo: A Unified Framework for Efficient Anything-to-Audio Generation

Audio and music generation based on flexible multimodal control signals is a widely applicable topic, with the following key challenges: 1) a unified multimodal modeling framework, 2) large-scale, high-quality training data, and 3) the prohibitive inference cost of multi-step diffusion sampling. As such, we propose AudioX-Turbo, a unified and efficient framework for anything-to-audio generation that integrates varied multimodal conditions (i.e., text, video, and audio signals) in this work. AudioX-Turbo follows a teacher-student paradigm. The teacher AudioX-Base is built on a Multimodal Diffusion Transformer with a Multimodal Adaptive Fusion module that aligns diverse multimodal inputs for high-fidelity synthesis, and is then distilled into the few-step student AudioX-Turbo via Distribution Matching Distillation adapted to flow matching, complemented by a diffusion-based discriminator for high-quality few-step generation. To support the training of AudioX-Turbo, we construct a large-scale, high-quality dataset, IF-caps-Pro, comprising approximately 9.2M samples curated through a two-stage data collection and annotation pipeline. We benchmark AudioX-Turbo across a wide range of tasks, finding that our model achieves superior performance, especially on text-to-audio and text-to-music generation, while operating at only 4 sampling steps and requiring approximately 25x fewer function evaluations (NFE) than multi-step baselines. These results demonstrate that our method is capable of audio generation under flexible multimodal control, showing efficient and powerful instruction-following capabilities. The code and datasets will be available at https://zeyuet.github.io/AudioX-Turbo/.

22.
arXiv (CS.AI) 2026-06-15

Generative AI for Managerial Decision-Making under Ambiguity and Sycophancy

arXiv:2603.03970v2 Announce Type: replace Abstract: Generative artificial intelligence (GenAI) is increasingly being integrated into complex business workflows, fundamentally shifting the boundaries of managerial decision-making. However, the reliability of its strategic advice in ambiguous business contexts remains a critical knowledge gap. To address this gap, this study compares multiple GenAI models in their ability to detect ambiguity, examines whether a systematic ambiguity-resolution process improves response quality, and investigates their susceptibility to sycophantic behavior when confronted with flawed managerial directives. Using a novel four-dimensional business ambiguity taxonomy, we conducted a human-in-the-loop experiment across strategic, tactical, and operational scenarios. The resulting decisions were assessed through a human-validated automated evaluation framework based on agreement, actionability, justification quality, and constraint adherence. The results show that our approach not only distinguishes different types of ambiguity, but also reveals how ambiguity resolution systematically changes model behavior. In particular, resolving ambiguities improved decision quality across all managerial levels, with the strongest gains observed in constraint adherence. The analysis further showed that sycophantic behavior is not uniform across models: some models challenged flawed assumptions, whereas others tended to comply with them. This study contributes to the bounded rationality literature by positioning GenAI as a cognitive scaffold that can detect and resolve ambiguities managers might overlook, while demonstrating that its artificial limitations require human oversight to ensure its reliability as a strategic partner.

23.
arXiv (CS.CV) 2026-06-16

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics

Foundation-model pipelines for individual-level livestock monitoring – combining open-vocabulary detection, promptable video segmentation, and self-supervised visual embeddings – have raised the accuracy ceiling of precision livestock farming (PLF), but their GPU memory budgets exceed the envelope of commodity edge accelerators. To close this gap, the 446M-parameter Perception Encoder (PE-ViT-L+) backbone of SAM 3 is distilled into a 40.66M-parameter multi-scale student through three mechanisms: a Feature Pyramid Network student encoder built on TinyViT-21M-512, a four-term direction-then-scale distillation loss, and backbone-substitution inference with sliding-window session pruning that bounds streaming GPU memory growth. The DINOv3 family includes a pre-distilled ViT-S/16 variant (21.6M parameters) released alongside a 6716M-parameter ViT-7B teacher; the ViT-S (21M) variant is adopted as the per-individual embedder. On the Edinburgh Pig dataset, the compressed pipeline reaches 92.29% MOTA and 96.15% IDF1 against the SAM 3 teacher (1.68- and 0.84-percentage-point losses), achieves a 7.77-fold reduction in system-level parameters and a 3.01-fold reduction in peak VRAM (19.52GB -> 6.49GB), and reaches 97.34% top-1 accuracy with 91.67% macro-F1 on nine-class pig behaviour classification. The pipeline fits inside an NVIDIA Jetson Orin NX 16GB envelope with 4.9GB of headroom, supporting a proposed – but not yet empirically validated – on-device embedding-pool re-identification mechanism whose per-individual footprint of approximately 94MB per animal per year produces a longitudinal visual record amenable to retrospective association with disease, lameness, reproductive, and growth outcome labels.

24.
arXiv (CS.LG) 2026-06-17

On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies

arXiv:2606.17276v1 Announce Type: cross Abstract: Generative recommendation (GR) has emerged as a promising direction for recommender systems. Recently, large language models (LLMs) have been increasingly adopted for GR, as their rich pretrained knowledge is expected to help them generalize beyond common user behavior patterns that traditional memorization-oriented baselines can capture. However, existing LLM-based GR works largely ignore LLMs' well-known tendency to memorize, which, if present in LLMs fine-tuned for GR, would restrict their utilization of pretrained knowledge. In this work, we investigate this concern by examining one-hop memorization, where a model recommends items that are direct successors of items in the training data. We show that LLMs do this more than non-LLM-based GR models-in fact, the vast majority of their gains over GR baselines are actually on users whose target items can be predicted through one-hop memorization. We intuit that improving performance on the remaining users requires LLMs to learn richer item-item relations beyond one-hop transitions. To achieve this, we propose IIRG, a novel training strategy that teaches LLMs to capture: (1) collaborative relations derived from item co-occurrences across multiple hops in user sequences, and (2) semantic relations among items with similar themes, both of which can serve as useful recommendation signals. We show that IIRG significantly improves over LLMs trained solely with standard next-item prediction, with especially large gains for users whose test items are not covered by train-time one-hop transitions.

25.
arXiv (CS.CV) 2026-06-18

Vines-DB: An RGB image dataset for multi-species ornamental vine segmentation

The Vines-DB dataset contains 1,218 original high-resolution RGB images of seven ornamental vine species collected under field conditions at the Utah Agricultural Experiment Station's Greenville Research Farm in Logan, Utah, USA. The dataset was generated from 168 individual vine plants that were transplanted in 2022 and photographed repeatedly across multiple months during the 2023 and 2024 growing seasons (July-October). Images were captured with an iPhone 16 Pro equipped with a 48 MP camera between 10:00 AM and 12:00 PM under daylight. Vines were grown on 1.2m x 2.4m trellises and photographed from a distance of 1m against black or white Styrofoam backdrops to improve contrast and reduce background noise. The dataset includes Akebia quinata, Campsis radicans, Hydrangea anomala petiolaris, Lonicera x heckrottii, Campsis x tagliabuana 'Madame Galen', Parthenocissus quinquefolia, and Wisteria floribunda. All original images were manually annotated in Roboflow by trained annotators to produce polygon-based instance segmentation masks for eight classes, including seven species and background. After preprocessing and data augmentation, the working dataset was expanded to 2,307 images for model development and evaluation. The augmented dataset was divided into 2,019 training images, 192 validation images, and 96 test images using stratified sampling to maintain balanced representation. Vines-DB supports the development and evaluation of deep learning models for multi-class instance segmentation in precision horticulture and urban ecology. The dataset enables applications such as automated canopy cover estimation, species identification, and scalable field phenotyping. In addition, repeated monthly imaging of the plants captures temporal variation in canopy development and plant appearance, increasing the dataset's utility for segmentation benchmarking under realistic field conditions.