Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-15

Efficient and simple Gibbs state preparation of the 2D toric code via duality to classical Ising chains

arXiv:2508.00126v2 Announce Type: replace Abstract: We introduce the notion of polynomial-depth duality transformations, which relates two sets of operator algebras through a conjugation by a poly-depth quantum circuit, and make use of this to construct efficient Gibbs samplers for a variety of interesting quantum Hamiltonians as they are poly-depth dual to classical Hamiltonians. This is for example the case for the 2D toric code, which is demonstrated to be poly-depth dual to two decoupled classical Ising spin chains for any system size, and we give evidence that such dualities hold for a wide class of stabilizer Hamiltonians. Additionally, we extend the above notion of duality to Lindbladians in order to show that mixing times and other quantities such as the spectral gap or the modified logarithmic Sobolev inequality are preserved under duality.

02.
arXiv (CS.LG) 2026-06-11

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

arXiv:2601.08136v2 Announce Type: replace Abstract: Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty that distinguishes online RL from standard generative modeling is the lack of direct samples from the target Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which uses a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. However, it remains unclear how these objectives are formally related, or whether they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample. Crucially, we introduce Langevin Stein operators to construct zero-mean control variates, deriving a general class of estimators that share the same expectation. We show that existing noise-expectation and gradient-expectation methods are simply two specific instances within this broader class. This unified view yields two key advancements: it extends the capability of targeting Boltzmann distributions from diffusion to flow policies, and it enables the principled combination of Q-value and Q-gradient information to form an effective estimator, thereby improving training efficiency and stability. We instantiate RFM to train a flow policy in online RL and demonstrate improved performance on continuous-control benchmarks compared to diffusion policy baselines.

03.
arXiv (CS.CL) 2026-06-24

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

Large Audio-Language Models (LALMs) have been widely used as judge models for the automatic evaluation of generated speech. However, prior approaches predominantly focus on holistic naturalness, leaving fine-grained paralinguistic distinctions underexplored. We introduce ParaPairAudioBench, a pairwise benchmark of 5,175 audio pairs across five paralinguistic dimensions: Style, Rate, Emphasis, Age, and Gender. Our experiments show that current LALM judges still lag behind human judgments by 32%p on average and exhibit severe calibration failures, particularly in Tie cases where the correct decision is to abstain. To further analyze lexical versus acoustic reliance, the benchmark includes both same-transcript and cross-transcript conditions. ParaPairAudioBench enables multi-dimensional, calibration-aware assessment of the reliability of LALM-as-a-Judge for paralinguistic speech evaluation.

04.
bioRxiv (Bioinfo) 2026-06-19

Identification of Altered Potassium Channels for Drug Repurposing in Long COVID Patients

Long COVID (LC) is a complex condition characterized by persistent, chronic multisystem manifestations, with a significant proportion of patients exhibiting neurological symptoms. Human ion channels (HICs), particularly potassium channels, are abundantly expressed in the nervous system and linked to key metabolic processes, making them potential candidates for understanding LC pathophysiology and drug repurposing. Meta-analysis of RNA-Seq datasets from COVID-19 recovered and LC patients was performed to identify altered HICs in LC. Differential gene expression analysis, functional enrichment analysis, and weighted gene co-expression network analysis (WGCNA) were performed to uncover key genes, pathways, and co-expression modules consisting of HICs, lipid metabolism-, and immune signaling-related genes. Drug-gene interaction analysis was performed to identify approved drugs targeting potential HICs. A total of 715 dysregulated genes, including eighteen HICs were identified, among which seven were potassium channels. Three significant modules containing HICs, lipid metabolism-, and immune signaling-related genes were identified and found to be associated with antigen processing and presentation, complement and coagulation cascades, and cytokine-related pathways. Approved drugs targeting KCNA6, KCNJ10, KCNN3, and KCNH4 were identified. With further experimental validation, these dysregulated potassium channels, supported by their co-expression networks and pathway associations, may act as potential candidates for drug repurposing in LC patients.

05.
arXiv (CS.CV) 2026-06-17

Spatio-Temporal Fusion Model for Standard View Classification of Echocardiographic Videos

Automated classification of standard echocardiographic views is crucial for efficient clinical workflow but faces three main challenges. First, publicly available datasets are scarce and limited in scale and view coverage. Second, the performance of some modern video-level architectures for echocardiographic view classification remains underexplored. Third, some view categories exhibit highly similar spatial appearances, making single-frame features insufficient for discrimination, while heterogeneous frame quality complicates robust temporal information fusion. To address these challenges, we release the Echocardiographic Videos of Nine Views (EV9V) dataset, comprising 5,138 videos, 910,579 frames, and 9 standard views, which is, to the best of our knowledge, the largest publicly available echocardiography video dataset. Using EV9V, we systematically benchmark representative video classification architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. Furthermore, we propose a Spatio-Temporal Fusion Model (STFM), an efficient dual-stream CNN-LSTM (Long Short-Term Memory) framework that jointly captures spatial anatomical structures and temporal cardiac dynamics. The proposed framework leverages uncertainty-aware learning to preferentially sample representative video segments during training and evidence-based fusion during inference, improving robustness to variations in frame quality across echocardiographic videos. Extensive experiments demonstrate that our method achieves competitive performance across diverse video classification models, validating the effectiveness of uncertainty-aware spatio-temporal learning for echocardiographic view classification. The code is available at https://github.com/bgx666/stfm.

07.
arXiv (CS.LG) 2026-06-16

Generative Molecular Design with Steerable and Granular Synthesizability Control

arXiv:2505.08774v2 Announce Type: replace-cross Abstract: Designing molecules that are both property-optimal and readily synthesizable is a central challenge in drug discovery. Existing works that do consider synthesizability can jointly output predicted synthesis routes for generated molecules. However, there has been minimal attention in addressing the ease of synthesis and with flexibility to incorporate desired reaction constraints. On the other hand, virtual screening searches for commercially available compounds, but imposes challenges when scaling to ultra-large (billion-size and beyond) chemical spaces. Here, we propose a generative design framework that unifies synthesis-constrained molecular design and ultra-large-scale virtual screening through steerable and granular synthesizability control. Generated molecules satisfy arbitrary multi-parameter optimization objectives with predicted synthesis routes satisfying mix-and-match constraints: including or avoiding certain reactions, incorporating specific building blocks, and minimizing synthesis route length. In an end-to-end in-house campaign targeting BRD4, we designed molecules synthesizable with specific selected reactions and building blocks, synthesized all six selected compounds, and identified two micromolar binders. We further demonstrate that reaction control enables efficient navigation of ultra-large make-on-demand chemical spaces to identify property-optimal candidates. By applying our framework to Chemspace's Freedom 4.0 make-on-demand space (142 billion molecules), we generated ~320k molecules (0.00023% of the library) on a single consumer-grade GPU (with only 8 GB GPU memory) and identified a micromolar Wee1 binder amongst 60 synthesized candidates. The single unified framework thus enables generating novel synthesizable molecules and retrieving catalogue-ready candidates, offering a flexible solution to mitigating the synthesizability bottleneck.

08.
arXiv (CS.LG) 2026-06-12

Thermodynamic assessment of machine learning models for solid-state synthesis prediction

arXiv:2602.04075v2 Announce Type: replace-cross Abstract: Machine learning models have recently emerged to predict whether hypothetical solid-state materials can be synthesized. These models aim to circumvent direct first-principles modeling of solid-state phase transformations, instead learning from large databases of successfully synthesized materials. Here, we assess the alignment of several recently introduced synthesis prediction models with material and reaction thermodynamics, quantified by the energy with respect to the convex hull and a metric accounting for thermodynamic selectivity of enumerated synthesis reactions. A dataset of successful synthesis recipes was used to determine the likely bounds on both quantities beyond which materials can be deemed unlikely to be synthesized. With these bounds as context, thermodynamic quantities were computed using the CHGNet foundation potential for thousands of new hypothetical materials generated using the Chemeleon generative model. Four recently published machine learning models for synthesizability prediction were applied to this same dataset, and the resultant predictions were considered against computed thermodynamics. We find these models generally overpredict the likelihood of synthesis, but some model scores do trend with thermodynamic heuristics, assigning lower scores to materials that are less stable or do not have an available synthesis recipe that is calculated to be thermodynamically selective. In total, this work identifies existing gaps in machine learning models for materials synthesis and introduces a new approach to assess their quality in the absence of extensive negative examples (failed syntheses).

09.
bioRxiv (Bioinfo) 2026-06-16

Accelerating String Comparison in RLZ Compressed Sequences via LCE Jumps

Relative Lempel-Ziv (RLZ) is an effective compression method for large, repetitive collections; however, the fundamental primitives required to elevate it from a passive archival format to a tractable representation for compressed construction have yet to be fully established. In this paper, we introduce an algorithmic framework for structurally comparing and lexicographically sorting sequences of RLZ factors. We characterize when direct factor comparisons are necessary and when they can be bypassed using RLZ specific shortcuts. We further introduce a method for extending truncated factors into right-maximal matches, enabling the recovery of matching statistics from the RLZ parse. Experimentally, RLZ sorting achieved speedups of up to 3.93x over character-based sorting. Together, these results advance the use of the RLZ format as a foundation for compressed construction.

10.
arXiv (CS.CL) 2026-06-16

From ASR to ASP: Evaluating Prompt Attack Vulnerabilities Against Open-Source LLMs

Recent studies demonstrate that Large Language Models (LLMs) are vulnerable to attacks that generate harmful or sensitive outputs. As open-source LLMs are increasingly adopted in high-impact applications such as finance, law, and healthcare, systematically investigating their security risks is becoming increasingly important towards trustworthy LLM era. This paper comprehensively studies effective prompt injection attacks against 14 widely used open-source and three closed-source LLMs on five attack benchmarks. Moreover, existing evaluation metrics mostly only consider the attack success rate, overlooking uncertainty in model responses. Our proposed Attack Success Probability (ASP) additionally captures uncertain behaviors for evaluation, where the model may initially refuse a harmful request but subsequently provide harmful guidance or vice versa, reflecting inconsistency and ambiguity in attack feasibility. By systematically analyzing the effectiveness of prompt injection attacks, we propose a straightforward and effective hypnotism attack; results show that this attack causes aligned language models, including Stablelm2, Mistral, Openchat, and Vicuna, to generate objectionable behaviors, achieving around 90% ASP. They also indicate that ignore prefix attacks can break all 14 open-source LLMs, achieving over 60% ASP on a multi-categorical dataset. We find that moderately well-known LLMs exhibit higher vulnerability to prompt injection attacks, highlighting the need to raise public awareness and prioritize efficient mitigation strategies.

11.
arXiv (CS.CV) 2026-06-19

Language-Instructed Vision Embeddings for Controllable and Generalizable Perception

Vision foundation models are typically trained as static feature extractors, placing the burden of task adaptation onto large downstream models. We propose an alternative paradigm: instead of solely feeding visual features into language models, we use language itself to dynamically guide the vision encoder. Our method, Language-Instructed Vision Embeddings (LIVE), leverages language as high-level guidance to produce task-centric embeddings at inference time, removing the need for task-specific retraining. This enables the encoder to focus on contextually relevant aspects of the input, yielding more controllable and generalizable representations. Empirically, LIVE reduces visual hallucinations (+34 points on MMVP), surpasses vision-language models with orders of magnitude more parameters on visual question answering, and generalizes to unseen instructions and tasks – offering a direct path toward adaptive, instruction-driven visual intelligence.

12.
arXiv (CS.LG) 2026-06-15

Multidimensional Bayesian Active Machine Learning of Working Memory Task Performance

arXiv:2510.00375v2 Announce Type: replace Abstract: While adaptive experimental design has outgrown one-dimensional, staircase-based adaptations, most cognitive experiments still control a single factor and summarize performance with a scalar. We show a validation of a Bayesian, two-axis, active-classification approach, carried out in an immersive virtual testing environment for a 5-by-5 working-memory reconstruction task. Two variables are controlled: spatial load L (number of occupied tiles) and feature-binding load K (number of distinct colors) of items. Stimulus acquisition is guided by posterior uncertainty of a nonparametric Gaussian Process (GP) probabilistic classifier, which outputs a surface over (L, K) rather than a single threshold or max span value. In a young adult population, we compare GP-driven Adaptive Mode (AM) with a traditional adaptive staircase Classic Mode (CM), which varies L only at K = 3. Parity between the methods is achieved for this cohort, with an intraclass coefficient of 0.755 at K = 3. Additionally, AM reveals individual differences in interactions between spatial load and feature binding. AM estimates converge more quickly than other sampling strategies, demonstrating that only about 30 samples are required for accurate fitting of the full model.

13.
arXiv (CS.CV) 2026-06-19

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator–the depth predictor defining the constraint–is available during both training and inference. Existing approaches generally fall into two categories: supervised models that treat the conditioning signal as a static cue and ignore alignment information at inference, and guidance-based methods that consult it through hand-tuned linear updates, typically trading fidelity to the condition against the plausibility of the generated sample. We argue that the fundamental gap in both paradigms is that the model is never trained to utilize its own alignment error. We introduce FlowBender, a closed-loop framework that treats this error as a first-class input, training the network to learn a correction policy conditioned on inference-time feedback. At each step, an unguided look-ahead pass estimates the clean signal, a task-specific deviation is computed via the forward operator, and a refinement pass consumes this signal to produce a corrected velocity. We propose several variants of FlowBender, including a gradient-based formulation for differentiable operators and a zero-order variant for non-differentiable settings such as JPEG compression. For efficient sampling, we introduce a prior-step shortcut that enables closed-loop correction at a minimal additional computational cost. Across image-to-image translation, restoration, and 3D mesh texturing, FlowBender consistently outperforms standard supervised baselines, alignment-loss-augmented training, and state-of-the-art inference-time guidance, improving fidelity and plausibility simultaneously rather than trading them against each other. Project page: https://flow-bender.github.io/

14.
arXiv (CS.LG) 2026-06-18

Task-Restricted Symmetries in Recurrent Weight Space

arXiv:2606.18457v1 Announce Type: new Abstract: Recurrent networks can contain substantial functional redundancy in weight space: changing a recurrent matrix may leave the input-output rollout nearly unchanged on a task distribution, while similar-scale changes can destroy the same behavior. We study this redundancy in one-layer tanh RNNs using ordered real Schur coordinates. The Schur form separates spectral blocks from directed nonnormal couplings, giving a diagnostic basis for structured ablations that keep the input and readout maps fixed. In a fixed-length copy task, selected nonnormal Schur couplings can be removed with little loss in some trained solutions, whereas other couplings are necessary for accurate autonomous replay. Across flip-flop, sine generation, and context-dependent integration, the loss-preserving ablation profile varies across tasks and trained solutions. These results identify candidate approximate functional invariances, not universal symmetries of recurrent weight space. Schur-coordinate ablations provide a practical diagnostic for which structured perturbations preserve a trained recurrent solution and which ones disrupt its computation.

15.
medRxiv (Medicine) 2026-06-10

Developmental Associations Linking Childhood Trauma and Early Cannabis Use to Adolescent DNA Methylation and Psychotic-Like Experiences

Background. Psychotic-like experiences (PLEs) index early risk for psychotic disorders and are consistently associated with childhood trauma, yet underlying biological mechanisms remain poorly understood. DNA methylation (DNAm) may capture the biological embedding of early adversity, while adolescent exposures such as cannabis use may modify these processes. We examined epigenome-wide associations of childhood trauma and PLEs, tested the moderating role of early cannabis use, and evaluated DNAm as a potential mediator. Methods. We analysed data from the Avon Longitudinal Study of Parents and Children (ALSPAC), a UK population-based birth cohort. Childhood trauma was assessed prospectively and retrospectively. Epigenome-wide DNAm was measured in peripheral blood at ~17 years using the Illumina 450K array, and PLEs were assessed at 18 using a structured interview. Epigenome-wide association studies were conducted for trauma-DNAm and DNAm-PLEs associations in the final sample (n = 1,457), adjusting for demographic, biological, and technical covariates. Differentially methylated regions (DMRs) were identified using DMRff, followed by functional enrichment analyses. Cannabis use at 15.5 was modelled as a moderator with multiple imputation for missing data. Mediation was tested using the Divide-Aggregate Composite-null Test (DACT). Results. Childhood trauma was associated with widespread DNAm differences, primarily at the regional level, with enrichment in pathways related to cellular stress responses. In contrast, DNAm associated with PLEs was more limited and implicated loci involved in epigenetic regulatory processes. These signatures were largely distinct, and there was no evidence supporting mediation after multiple testing correction. Incorporating cannabis use altered the pattern and extent of DNAm associations, with stronger and more significant signals observed at both CpG and regional levels, although these did not translate into evidence of mediation. Conclusion. Childhood trauma and PLEs show distinct DNAm signatures in adolescence, with trauma-related DNAm reflecting broad stress-related processes and PLE-associated DNAm implicating regulatory mechanisms. We found little evidence that DNAm mediates the trauma-PLE association. Instead, adolescent exposures, particularly cannabis use, may distinctly influence trauma-related epigenetic variation with limited detectable downstream effects on PLEs. These findings support a context-dependent model of epigenetic risk and highlight the need for larger longitudinal studies to clarify causal pathways linking early adversity to psychosis.

16.
arXiv (CS.AI) 2026-06-16

QoS-Aware Token Scheduling and Private Data Valuation for Multi-Modal Agentic Networks

arXiv:2606.15573v1 Announce Type: new Abstract: In agentic systems, human-generated data records anchor the value of AI services. Yet cloud compute pipelines centralize processing on remote servers. Data centralization reduces personal data sovereignty and may potentially degrade the quality of service (QoS). Meanwhile, user contributions are diverse in quantity and quality: decentralized records can be biased, noisy, and heterogeneously distributed. To address the data challenge, we study fair token allocation and private data valuation for decentralized and resource-constrained agentic systems. Our approach embeds multi-modal representations in a shared semantic space and releases differentially private (DP) prototypes to preserve utility while reducing semantic leakage. With the DP guarantee, we design a fair token allocation scheme that rewards effective contributions and remains robust to data heterogeneity and AI resource scarcity. Extensive simulations demonstrate improved contribution-based fairness and QoS compared to standard benchmarks. The improved resistance to image reconstruction attacks indicates enhanced privacy for multi-modal personal data.

17.
arXiv (CS.CV) 2026-06-15

S$^2$COPE: Self-Supervised Concept Discovery via Preference Learning

Current representation learning paradigms force a fundamental compromise: self-supervised methods scale to massive datasets but yield opaque features, whereas interpretable models remain bottlenecked by the need for dense human annotation. We introduce Self-Supervised Concept discOvery via Preference lEarning (\model), a label-free framework that resolves this dilemma. Instead of treating Vision-Large-Language Models (VLLMs) as static feature extractors, \model leverages them as active participants in a self-supervised preference optimization loop. By autonomously hypothesizing, validating, and reinforcing candidate visual attributes directly from raw imagery, our framework discovers novel, structured concepts without a single label. Extensive experiments across natural, medical, and physics domains demonstrate that \model successfully extracts domain-specific concepts where standard VLLMs often fail to generate. By amortizing concept discovery directly into the VLLM backbone through our self-supervised preference objective – rather than relying on static generation and disjoint filtering – we achieve up to a 24-point absolute improvement in downstream top-1 classification accuracy on unseen data. Our work suggest that interpretability can emerge through a model's autonomous interaction with incidental visual structures, without any human supervision.

18.
arXiv (quant-ph) 2026-06-17

Quantum Routers: A Switching-Fabric Framework for Quantum-Native Forwarding

arXiv:2606.17773v1 Announce Type: new Abstract: Forwarding in quantum networks cannot be realized by directly transposing classical switching fabrics, since the no-cloning theorem and the quantum measurement postulate constrain the direct relay of quantum information while ruling out copy-based buffering and inspection. In this paper, we propose a switching-fabric framework for quantum routers based on multipartite entanglement. Specifically, we formalize the notion of an entanglement-based switching fabric, in which a graph state acts as the forwarding resource and entanglement forwarding is realized through local Pauli measurements. We translate the classical notions of blocking and non-blocking operation into structural conditions for entanglement-based fabrics, by deriving the edge-controlled (EC) design principle for non-blocking operation. We instantiate this principle through a monolithic EC crossbar and a modular Clos-type EC fabric, for which we characterize resource scaling and identify the regime where the modular design becomes more resource-efficient than the monolithic one. Finally, a forwarding-latency analysis establishes a fundamental distinction between matching-oblivious and matching-driven forwarding: the proposed EC fabrics realize all requested input-output entanglement links with constant forwarding depth under sufficient measurement parallelism, whereas matching-driven EPR-based fabrics exhibit latency that scales with the number of requested connections. The proposed framework provides a hardware-agnostic foundation for quantum-router switching fabrics.

19.
arXiv (CS.LG) 2026-06-12

The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

arXiv:2606.13191v1 Announce Type: new Abstract: Continuous-state generative samplers, including diffusion and flow-matching models, evolve through continuous reverse-time dynamics, yet their samples often undergo abrupt qualitative changes: trajectories commit to modes, semantic alternatives collapse, and small perturbations in narrow time windows can produce large downstream effects. This paper develops a geometric account of such phase-transition-like behaviour. We view denoising as gradient descent on a free energy landscape and show that sharp transitions arise near projection caustics, where the nearest-point projection onto the data support ceases to be unique. Motivated by this perspective, we introduce the Critical Boundary Detector (CBD), as practical diagnostics for score-direction instability. Across toy models, standard diffusion models, and latent text-to-image diffusion models, CBD localises mode commitment, predicts intervention-sensitive windows, and supports targeted control in geometrically sensitive regions. Our results connect geometry of data and dynamics of diffusion generation.

20.
arXiv (CS.CL) 2026-06-16

Emergent retokenization symmetry in large language models: phenomenology and applications

Tokenization introduces representational redundancy: under a fixed token vocabulary, every byte string admits many valid token encodings, or segmentations, that decode to the same surface string. However, given a prompt, most language model tokenizers break this representational symmetry by returning a canonical segmentation. Training only on canonical segmentations should influence inference behavior, and there is little reason to expect models to respect segmentation symmetry on downstream tasks. We find that this symmetry partially emerges during training. Here, we probe this emergent symmetry through experiments testing token compositional understanding, representation diversity, and task focused benchmark performance. We primarily use retokenization – replacing a prompt's canonical tokenization with an alternative segmentation while preserving its bytes exactly. Relative to other prompt perturbations, retokenization is unusually clean because it isolates segmentation effects without changing syntax, semantics or surface form. We use retokenization to study sensitivity and robustness to semantically identical input representations across pretraining and post-training. Moreover, this partial retokenization symmetry suggests a distinct inference-time sampling axis. While temperature sampling generates diverse outputs from the model using its next-token probability distribution, retokenization generates diversity from the model's internal computations through semantically equivalent input representations. We find that while this retokenization sampling strategy can hurt performance on easy problems, it can also recover solutions that conventional sampling does not find. Overall, our work presents retokenization as a simple yet powerful probe of large language models, shedding light on compositional understanding and prompt sensitivity, and offering a novel sampling strategy.

21.
arXiv (CS.AI) 2026-06-19

VCG: A Multimodal Retrieval Framework for E-Commerce Video Feeds under Extreme Cold-Start Conditions

arXiv:2606.19627v1 Announce Type: cross Abstract: The digital commerce landscape is shifting from static, search-driven catalogs to dynamic, immersive video feeds. This transition introduces an ``extreme cold-start'' problem: unlike traditional items, new short-form videos lack the dense interaction history required for collaborative filtering. Furthermore, immersive feeds introduce strong position and duration biases that distort standard engagement signals. In this paper, we demonstrate the Video Candidate Generation (VCG) system, a scalable multimodal retrieval engine designed to solve these challenges in a large-scale e-commerce environment. By leveraging a domain-adapted vision-language model (based on CLIP), we map users and videos into a shared semantic space, enabling zero-shot retrieval based on visual content rather than behavioral history. We detail the system's architecture and present a rigorous evaluation comparing generative (LLM) vs. discriminative (CLIP) embeddings. Our results show that while generative models excel at attribute prediction, they suffer from embedding space collapse in retrieval tasks. Online A/B testing demonstrates that VCG effectively mitigates engagement biases, yielding a 50\% uplift in deep video completion. To showcase the system's capabilities, we present an interactive demonstration featuring three bi-directional retrieval scenarios: Product-to-Video, Video-to-Product, and Zero-Shot Semantic Search.

22.
arXiv (CS.LG) 2026-06-16

Scalable Graph Condensation with Evolving Capabilities

arXiv:2502.17614v3 Announce Type: replace Abstract: The rapid growth of graph data creates significant scalability challenges as most graph algorithms scale quadratically with size. To mitigate these issues, Graph Condensation (GC) methods have been proposed to learn a small graph from a larger one, accelerating downstream tasks. However, existing approaches critically assume a static training set, which conflicts with the inherently dynamic and evolving nature of real-world graph data. This work introduces a novel framework for continual graph condensation, enabling efficient updates to the distilled graph that handle data streams without requiring costly retraining. This limitation leads to inefficiencies when condensing growing training sets. In this paper, we introduce GECC (\underline{G}raph \underline{E}volving \underline{C}lustering \underline{C}ondensation), a scalable graph condensation method designed to handle large-scale and evolving graph data. GECC employs a traceable and efficient approach by performing class-wise clustering on aggregated features. Furthermore, it can inherit previous condensation results as clustering centroids when the condensed graph expands, thereby attaining an evolving capability. This methodology is supported by robust theoretical foundations and demonstrates superior empirical performance. Comprehensive experiments including real world scenario show that GECC achieves better performance than most state-of-the-art graph condensation methods while delivering an around 1000$\times$ speedup on large datasets.

23.
arXiv (CS.CL) 2026-06-19

AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA

Financial chart question answering in regulated settings demands more than accuracy: practitioners must know which answers to trust before acting on them, and many institutions cannot send client data to external model providers. Yet existing chart-QA agents are accuracy-focused and opaque, and most assume proprietary API access; to our knowledge, none combines auditability with on-premise deployability without significant accuracy compromise. We present AgentFinVQA, a multi-agent pipeline that decomposes each query into planning, OCR, legend grounding, visual inspection, and verification, recording every step in a traceable Model Evaluation Packet (MEP) per sample. On FinMME, AgentFinVQA improves $+7.68$ pp over a primary-backbone matched zero-shot baseline with a proprietary backbone (Gemini-3 Flash; 71.24% vs. 63.56%, McNemar $p \approx 1.1 \times 10^{-16}$), and $+4.84$ pp with open-weights Qwen3.6-27B-FP8 served locally. The verifier's verdict also serves as a useful confidence signal (68.2% vs. 55.6% exact accuracy on confirmed vs. revised answers), enabling human-in-the-loop review routing. Error analysis shows that question misunderstanding, legend confusion and extraction error account for nearly two-thirds of failures and are the categories least detected by the verifier, identifying clear directions for future work. Together these results show that auditable, on-premise financial chart QA is practical and that the open-weights system keeps most of the accuracy gains while enabling full data residency. We release our code to support reproducible evaluation.

24.
arXiv (CS.CL) 2026-06-19

Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context, obscuring whether failures arise from data properties or model limitations. Uncertainty decomposition-separating aleatoric from epistemic sources-is particularly crucial in this setting, yet existing methods, designed for standard generation tasks, fail to capture the unique dynamics of ICL. To address this, we introduce a concept of self-function vectors, built upon Bayesian views and the mechanistic interpretability of ICL. These vectors leverage internal model representations to model the latent concept learned during in-context prompting, thereby enabling a direct estimation of aleatoric uncertainty within a Bayesian framework and circumventing the reliance on brittle input or decoding manipulations. Given the lack of established benchmarks and suitable evaluation protocols, we also propose the first and rigorous evaluation protocol, in which data is manipulated in controlled ways so as to quantify aleatoric uncertainty precisely and separately from epistemic uncertainty. With this new evaluation framework, initially grounded in synthetic tasks for conceptual development and subsequently extended to real-world datasets, we show that our proposed methodology can measure uncertainty of LLM predictions made under ICL more reliably than existing alternative methods. Moreover, we show it can be used as a practical tool for trustworthy-related applications, such as hallucination detection. Our findings pave a new direction for connecting the quantitative view of uncertainty with the mechanistic understanding of model behavior.

25.
arXiv (quant-ph) 2026-06-24

High-harmonic generation driven by temporal-mode quantum states of light

arXiv:2512.06602v2 Announce Type: replace Abstract: We develop a theoretical framework for high-harmonic generation (HHG) driven by quantum states of light based on a temporal-mode expansion of the electromagnetic field. This approach extends previous single plane-wave mode treatments to realistic pulse configurations and arbitrary multi-mode states of light, resolving conceptual inconsistencies arising from non-normalizable infinite plane waves and establishing consistency between analytical and numerical methods. We derive a correction factor that quantifies deviations from the diagonal approximation (in which the yield becomes a statistical average over classical-field simulations) both for the response of a single atom and in the many-atom regime. Our results confirms that the HHG spectrum for atoms driven by any quantum state of light in free space is accurately described by averaging semi-classical calculations over the Husimi distribution, with no observable genuine quantum effects in the spectrum. We also demonstrate that in the many-atom regime, the mean-field coherent-state approximation underlying this treatment does not preserve probabilities, although unitarity is restored by in the diagonal approximation. The absence of genuine quantum effects in the HHG yield is attributed to the large photon numbers ($\sim 10^{11}$) required to reach HHG intensities in free space, which render quantum fluctuations negligible. We discuss nanophotonic environments with ultrasmall mode volumes as potential platforms where few-photon strong-field processes could exhibit genuine quantum signatures.