Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-24

Crystalline Spectral Form Factors

arXiv:2512.11054v3 Announce Type: replace Abstract: We investigate crystalline-like behavior of the spectral form factor in unitary quantum systems with extremely strong eigenvalue repulsion. Using a low-temperature Coulomb gas as a model of repulsive eigenvalues, we derive the Debye-Waller factor suppressing periodic oscillations of the spectral form factor and estimate the order of its singularities at multiples of the Heisenberg time. We also reproduce this crystalline-like behavior using perturbed permutation circuits and random matrix ensembles associated with Lax matrices. Our results lay a foundation for future studies of quantum systems that exhibit intermediate level statistics between standard random matrix ensembles and permutation circuits.

02.
arXiv (CS.CL) 2026-06-12

KCSAT-ML: Probing Reasoning Models with Nationwide-Cohort Human Difficulty

Math reasoning benchmarks have proliferated, yet most lack a per-item difficulty signal grounded in actual human performance. We introduce KCSAT-ML, a decade (2014-2025) of Korean College Scholastic Ability Test (KCSAT; Suneung) mathematics: 664 problems with a 339-item core set carrying official per-item error rates from nationwide cohorts of hundreds of thousands of examinees. We pair the benchmark with Difficulty-aligned Reasoning Gain (DRG): a score-orthogonal metric that asks whether a model's mistakes concentrate on the items humans found hard, or on items humans found easy. Together they expose, across a wide range of VLMs (and LLMs via OCR), three patterns: (i) low-budget accuracy collapses on the high-human-error tail at every model size; (ii) test-time scaling (TTS) raises token use roughly linearly with cohort error rate, while accuracy gains follow a non-monotonic curve; (iii) within a single family, TTS flips between anti-scaling on the hardest items and overthinking on easier ones – two faces of the same alignment failure. On DRG, models with near-identical accuracy can sit at near-opposite values: one model gets wrong what humans also find hard, while another solves the hardest items yet fails on items humans find easy – a contrast that aggregate accuracy hides. Our code and dataset builder will be open-sourced at https://github.com/naver-ai/KCSAT-ML.

03.
arXiv (CS.CL) 2026-06-16

Scaling Human and G2P Supervision for Robust Phonetic Transcription

Expert phonetic annotation is costly, especially for non-standard dialects and atypical speech. A common alternative is using Grapheme-to-Phoneme (G2P) models to auto-generate phonetic labels from text transcripts at scale. We study how automatic phonetic transcription performance scales with human and G2P supervision in English. Using a curated 80-hour benchmark spanning native, non-native and post-stroke speech, we identify a supervision quality threshold: G2P supervision helps only when fewer than 20-30 hours of human annotation are available. Beyond this threshold, it provides no significant benefit and can reduce cross-dialect robustness. What is effective after this threshold is ASR pretraining which we use to achieve a 2.3x reduction in weighted phone feature error rate over prior systems, with strong gains on non-native and aphasic speech. These results suggest that quantity-driven G2P scaling may yield diminishing returns for robust generalization.

04.
arXiv (CS.CL) 2026-06-11

Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Large language models (LLMs) encode rich semantic information in their hidden states, yet it remains difficult to understand what information these internal representations capture. Latent concepts extracted from hidden states offer a promising direction for interpreting LLMs, but existing clustering-based methods face a trade-off: hierarchical clustering produces coherent concepts but is limited to small datasets due to its quadratic memory cost, while K-Means scales efficiently but may yield less semantically coherent concepts. We propose Vector Quantized Latent Concept (VQLC), a discrete concept learning framework that learns a codebook of latent concepts on frozen hidden states. Across 12 dataset-model settings, VQLC stays close to K-Means in computational cost, scales better than hierarchical clustering, and remains competitive in faithfulness, with the clearest gains on decoder-only models. LLMs-based evaluation, qualitative analysis, and a Sparse Autoencoder (SAE) comparison demonstrate that the learned concepts are interpretable and task-relevant.

05.
arXiv (CS.AI) 2026-06-16

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

arXiv:2606.15768v1 Announce Type: cross Abstract: Vision-Language-Action models (VLAs) leverage large-scale vision-language pretraining for semantic robot control, but often lack explicit foresight into how robot actions change the scene. World-Action Models (WAMs) address this limitation by conditioning policies on predicted futures, yet existing approaches typically rely on computationally expensive video generation with substantial pixel-level redundancy. We present LaWAM, a Latent World Action Model that exposes predictive dynamics to robot policies through compact latent visual subgoals instead of reconstructed future video. At the core of LaWAM is a latent-action-conditioned Latent World Model (LaWM). We obtain LaWM by training a latent action model in the latent space of a pretrained vision foundation model and repurposing its forward decoder to predict future observation features for scene evolution. LaWAM then conditions action generation on these predicted latent visual subgoals to enable dynamics-aware robot control. LaWAM achieves state-of-the-art or competitive success rates (SRs) across LIBERO (98.6% SR), RoboTwin (91.22% SR), and real-world manipulation tasks while retaining low-latency inference. LaWAM runs in 187 ms per action-chunk prediction and achieves up to 24x lower wall-clock latency than pixel-space WAMs.

06.
bioRxiv (Bioinfo) 2026-06-11

Combinatorial docking and molecular generation to navigate over 100-billion molecules for prospective ligand discovery

Commercially available make-on-demand libraries now exceed 100 billion compounds, requiring over 50 years to screen on 2,000 CPU cores using conventional docking. We present two complementary approaches to address this challenge. CombiDOCK, a combinatorial docking framework, enables exhaustive screening at the 100-billion scale within 40 days. MINT-Dock, a generative framework, accelerates navigation of this space by integrating CombiDOCK with Monte Carlo Tree Search. Benchmarked on 46 diverse targets, CombiDOCK matched full-molecule docking accuracy, and MINT-Dock achieved a 4,800-fold enrichment over random selection. Compared with prior billion-scale brute-force campaigns against {sigma}2, VMAT2, and VAChT, prospective CombiDOCK screens of the 100-billion-molecule library yielded higher hit rates and more potent ligands, while MINT-Dock achieved comparable outcomes across single- and multi-target objectives with >20-fold computational cost reductions. Docking-predicted poses of the best VAChT-binding compounds were confirmed by cryo-EM structures. These methods provide exhaustive and generative paths for navigating the trillion-molecule frontier of drug discovery.

07.
arXiv (CS.LG) 2026-06-18

Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment

arXiv:2606.18703v1 Announce Type: new Abstract: Pretrained biological language models expose per-token probability distributions through masked-token prediction, providing the likelihood interface central to sequence design, variant scoring, and mechanistic interpretation. Yet these distributions are learned from broad unlabeled corpora and are not naturally conditioned on task-specific biological contexts such as interaction partners, cellular environments, or therapeutic interventions. Existing contextual matching methods often distort this interface through pooled embeddings, contrastive latent spaces, or task-specific prediction heads. We introduce LOGICA (Logit-space Contrastive Alignment), a framework for context-conditioned prediction that performs contrastive learning directly in output-logit space. Using gated cross-modal adapters compatible with each model's native token head, LOGICA preserves the pretrained likelihood interface and converts contextualized token log-likelihoods into matching scores. Alignment is defined through context-sensitive token probabilities rather than proximity in a shared embedding space, enabling learning from sparse paired data across models with distinct vocabularies, without a shared tokenizer or decoder. LOGICA is particularly effective for mutation-local variant ranking, where comparisons reduce to context-conditioned likelihoods of mutant tokens at perturbed sites. Across protein–ligand binding, TCR–peptide activity, and drug-conditioned resistance prediction, LOGICA improves over prior state-of-the-art methods, including matched latent-contrastive and conditional MLM baselines, while retaining a token-level interface for interpretation and generation. On held-out-gene single-mutation drug-resistance prediction, LOGICA improves AUC from near-random latent-space baselines of $\sim$0.55 to $\sim$0.65.

08.
arXiv (CS.CV) 2026-06-15

Rethinking Global Average Pooling: Your Classifier Is Secretly a Multi-Instance Learner

作者:

Modern image classifiers widely adopt global average pooling (GAP) followed by a linear classification head. This linearity ensures that the image-level logits equal the average of logits obtained by applying the classification head pointwise to the feature grid prior to GAP. Consequently, standard classifiers may inherently retain spatial class evidence that remains recoverable even when the image-level prediction is incorrect. This structure naturally suggests a multiple-instance learning (MIL) interpretation, where an image is viewed as a bag of spatial instances. Within this formulation, we demonstrate that standard classifiers trained with a single label per image can still learn the intended classification task in multi-object scenes. We further exploit this property to decompose image-level logits into a prediction grid, providing a post-hoc diagnostic to extract spatial class evidence that GAP otherwise obscures. Our systematic evaluation reveals that off-the-shelf models consistently recover the ground-truth class within foreground regions. The MIL interpretation further suggests that common classifier failures reflect known limitations of mean aggregation.

09.
arXiv (CS.CL) 2026-06-19

What sentiment analysis can't see: Measuring whether customers were helped, and what went wrong, across 70,000 support conversations

Most companies read their customer support data at scale using sentiment analysis, which measures how customers sound rather than whether they were satisfied with the result. We tested a richer alternative on 70,450 support conversations from a leading online fundraising platform: alongside tone, we used GPT-5.4 to estimate each customer's satisfaction and to flag whether they reported a concrete problem, then validated all three readings against the 1-to-5 ratings customers left on the conversations they rated. The satisfaction estimate tracked those ratings far better than sentiment did, correlating at 0.47 against 0.36 and flagging unhappy customers with far fewer false alarms. The structured read also sees what sentiment cannot: tone and satisfaction disagree in 44% of conversations, a single "Neutral" label hides everything from quietly satisfied customers to ones who quietly gave up, and the largest group of all is "tolerated friction," customers who are satisfied but still reporting a fixable problem, a standing issue that no sentiment-based dashboard can surface. The broader finding is that LLM-based annotation can capture far more than the tonality of a customer's language, offering strong potential for new business metrics grounded instead in the customer's state (whether they were satisfied) and the cause of their problem extracted directly from the raw textual data of interactions and feedback.

10.
arXiv (CS.AI) 2026-06-16

Unifying Post-hoc Explanations of Knowledge Graph Completions

arXiv:2507.22951v2 Announce Type: replace Abstract: Knowledge Graphs organize information as entity-relation-entity triples, enabling machine learning models to predict plausible missing triples in a task known as Knowledge Graph Completion (KGC). Post-hoc explainability for KGC addresses the problem of identifying which triples most influence the predictions of machine learning models. Currently, the field lacks formalization and consistent evaluations, hindering reproducibility and cross-study comparisons. This paper argues for a unified taxonomy for post-hoc explainability in KGC. First, we propose a characterization of post-hoc explanations via multi-objective optimization that unifies existing post-hoc explainability algorithms in KGC and the explanations they produce, balancing explanation effectiveness and conciseness. Next, we examine improved evaluation protocols based on popular metrics, such as Mean Reciprocal Rank and Hits@k, through illustrative experiments. Finally, we stress the importance of interpretability as the ability of explanations to address queries meaningful to end users. By unifying methods and discussing evaluation standards, this work puts forward a case for more reproducible and impactful research in KGC explainability.

11.
arXiv (CS.CL) 2026-06-24

Towards Version-aware Operations and Transaction Memories for Multi-layer MeMo

作者:

MeMo proposes language models with explicit multi-layer correlation matrix memories (CMMs), where memorization, retrieval, and forgetting are architectural operations. This paper asks how such memories can reduce the need for retraining when knowledge changes. For changes expressible as MeMo memory associations, the model's accessible knowledge can be updated by editing explicit memories rather than retraining the whole model. We propose a version-aware operation layer in which high-level operations such as replace, obsolete, keep-history, rollback, and trace are compiled into MeMo-native primitive calls over sequences and tokens. The key observation is that a version-aware operation is rarely a single MeMo association. It is an ordered transaction of primitive edits, for example forgetting one sequence-token chain, memorizing another, preserving a historical chain, and recording an inverse program. The framework introduces two auxiliary CMMs: a Version CMM (V-CMM) for mapping version transitions to transaction handles, and a Transaction CMM (T-CMM) for storing reusable change contents and inverse programs. It supports both direct sequence-level edits and structured diff-level inputs, and outlines an evaluation route for update success, rollback, traceability, locality, and transaction reuse.

12.
arXiv (CS.CL) 2026-06-16

A Unified Definition of Hallucination: It's The World Model, Stupid!

Despite numerous attempts at mitigation since the inception of language models, hallucinations remain a persistent problem even in today's frontier LLMs. Why is this? We review existing definitions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. We argue that hallucination can be unified by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which contradicts a knowledge base OR producing a summary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evaluations to clarify their assumed reference "world", distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this definition, we also connect our framework to HalluWorld, a complementary benchmark that instantiates fully specified reference world models for stress-testing model hallucinations.

13.
arXiv (CS.CV) 2026-06-24

A Dual Edge Spatial Jacobian Image Graph for Interpretable Diabetic Retinopathy Grading

Automated diabetic retinopathy (DR) grading from colour fundus photographs can achieve strong predictive performance, but clinical interpretation requires more than an image-level label. It requires understanding how lesion evidence is distributed around retinal vessels and how this evidence relates to quantitative vascular biomarkers. We present a dual-edge spatial-Jacobian image graph for interpretable DR grading. Each fundus image is represented as a graph node with four aligned evidence streams: AutoMorph vessel information ($X_1$), DR-XAI-style lesion evidence maps ($X_2$), a 128-dimensional lesion-based contrastive image embedding ($X_3$), and AutoMorph morphometric biomarkers ($X_4$). The spatial edge branch ($X_{12}$) encodes vessel-lesion geometry, while the Jacobian branch ($X_{34}$) models embedding-biomarker sensitivity. Lightweight two-token attention fuses both edge families into a final image graph. On 2,910 matched non-augmented APTOS images, the full graph achieves 0.8076 accuracy, 0.8312 quadratic weighted kappa, 0.5915 macro-F1, and 0.9330 adjacent-grade accuracy; referable DR reaches 0.9055 accuracy and 0.9711 AUROC. The framework is positioned as an explainable representation-learning tool for lesion-biomarker hypothesis generation, rather than as a deployment-ready clinical classifier. The code is available at https://github.com/Inamullah-Colab/dual-edge-dr-graph-xai.

14.
arXiv (CS.AI) 2026-06-15

When Sample Selection Bias Precipitates Model Collapse

arXiv:2606.13732v1 Announce Type: new Abstract: The proliferation of recursive training on synthetic data can alleviate data scarcity but risks model collapse, where repeated training erodes distributional tails and homogenizes outputs. Data selection is widely viewed as a remedy, yet its reliability depends critically on the reference distribution used by the verifier. We show that in low-resource verification regimes, where each verifier observes only a small, fragmented, and biased slice of the target manifold, selection itself becomes biased. This situation naturally arises in low-resource data silos such as healthcare consortia or proprietary financial institutions, where raw data cannot be pooled and local references are inherently incomplete. As a result, selection preferentially retains samples aligned with the local manifold while pruning globally relevant tail modes, turning from a safeguard against collapse into a mechanism that precipitates it. We theoretically prove that such siloed selection accelerates collapse and induces power-law diversity decay. As an initial mitigation, we construct Wasserstein proxy references from multiple silos without sharing raw data. Empirical results confirm that local-reference selection fails on skewed distributions, whereas collaborative proxy references mitigate diversity degradation, suggesting that recursive synthetic-data pipelines require particular caution when real-data coverage is fragmented or scarce.

15.
arXiv (CS.AI) 2026-06-15

Listening with Attention: Entropy-Guided Explainability for Transformer-Based Audio Models

arXiv:2606.14647v1 Announce Type: cross Abstract: Transformer-based automatic speech recognition (ASR) models such as Whisper are highly accurate, but their predictions remain difficult to interpret. Existing explainable AI (XAI) methods often lack faithfulness and precise temporal grounding. We propose Listening with Entropy-guided Attention for Faithful explainability (LEAF-X), a model-intrinsic XAI framework for transformer-based ASR. LEAF-X combines entropy-guided attention weighting, multi-layer attention rollout, and optional causal ablations to identify low-entropy, high-impact heads and layers, producing sparse token-to-frame attributions. Unlike perturbation-based explainers or raw attention maps, LEAF-X exploits the internal structure of encoder-decoder and speech-augmented decoder-only models to generate explanations that better reflect model computation. Results show 32% improved faithfulness, 35-39% stronger locality/sparsity, and the most stable attributions, supporting more transparent and auditable ASR.

16.
arXiv (math.PR) 2026-06-17

A Tanaka-Type Formula for Compact Sets and Equilibrium Measures of L\'{e}vy Processes

arXiv:2606.17472v1 Announce Type: new Abstract: Tanaka's formula is a classical identity for Brownian motion, and Tsukada (2018) extended it to L\'{e}vy processes not necessarily symmetric. From a potential-theoretic point of view, this formula shows that the invariant function for the process killed upon hitting a singleton can be decomposed into the sum of a martingale part and a local time. In this paper, we generalize this singleton setting and derive a Tanaka-type formula for a compact set $B$. To this end, we introduce the equilibrium measure, defined as the rescaled limit of the $q$-capacity measures, and show that the invariant function for the process killed upon hitting $B$ can be represented as the integral, with respect to the equilibrium measure, of the invariant functions associated with processes killed upon hitting singletons, up to an additive constant called the Robin constant. Moreover, when $B$ is an interval, we obtain explicit representations of the equilibrium measure, the Robin constant, and the martingale part for recurrent stable processes as well as for recurrent spectrally negative L\'{e}vy processes. Finally, we discuss how an analogous Tanaka-type formula can also be established for transient L\'{e}vy processes.

17.
Nature (Science) 2026-06-09

Scientists have a bad case of AI FOMO, <i>Nature</i> poll reveals

作者:

Almost half of the scientists who responded said that they feel broadly negative towards artificial intelligence, but they think that some tools are better than others. Almost half of the scientists who responded said that they feel broadly negative towards artificial intelligence, but they think that some tools are better than others.

18.
arXiv (quant-ph) 2026-06-19

Near-Optimal Learning of Local Lindbladians

arXiv:2606.20535v1 Announce Type: new Abstract: We study the problem of learning local Lindbladians from black-box access to the physical evolution, and the goal is to estimate all Hamiltonian and dissipative coefficients. We give an algorithm built directly from finite-time channel probes, which runs the unknown evolution for short times, estimates the corresponding Pauli transfer matrices from classical shadows, and converts these estimates into Lindbladian coefficients by stable local Fourier inversions. For fixed locality and bounded dissipative site degree, the uses of the dynamical evolution and total evolution time scale as $\widetilde{O}(\Lambda^2/\varepsilon^2)$ and $\widetilde{O}(\Lambda/\varepsilon^2)$ respectively, in the local dynamical strength bound $\Lambda$ and target accuracy $\varepsilon$, with only logarithmic dependence on the number of qubits. The algorithm is non-adaptive, uses no ancillas, and uses only random product states as inputs followed by random Pauli measurements. The method does not require knowing the support of the Lindbladian in advance. We complement the algorithm with matching lower bounds, showing that the learning algorithm is near-optimal both in physical dynamics accesses and in total evolution time. We construct a single-qubit dephasing Lindbladian family that already requires $\Omega(\Lambda^2/\varepsilon^2)$ channel uses and $\Omega(\Lambda/\varepsilon^2)$ total evolution time, even for adaptive algorithms with arbitrary ancillas and measurements. In particular, the lower bounds imply that the Heisenberg-limited scaling achievable for Hamiltonian learning is information-theoretically impossible once dissipative coefficients must be estimated.

19.
bioRxiv (Bioinfo) 2026-06-17

In silico characterization of lysis and host-recognition modules in Staphylococcus aureus bacteriophage genomes

Background/aim: Antimicrobial resistance in methicillin-resistant Staphylococcus aureus (MRSA) requires precision non-antibiotic therapeutics, yet phage lytic efficacy is poorly predicted by phenotypic assays, as shown by paradoxical biofilm responses. This study characterized the genomic architecture of lytic S. aureus bacteriophages, focusing on the conservation of the lysis module and the variability of host-recognition modules, to provide a rational basis for phage candidate selection. Materials and methods: Twenty-two complete S. aureus phage genomes were retrieved from NCBI GenBank. Genomic features were extracted with custom Biopython scripts. Lysis (endolysin, holin) and host-recognition (tail fiber/receptor-binding protein) modules were annotated and validated by InterPro domain analysis, with disrupted endolysins resolved by tBLASTn. Phylogeny was reconstructed from large terminase subunit (TerL) sequences using maximum likelihood. Results: Genome size spanned three classes, from 17.5 to 148.6 kb. The LysK-type endolysin (CHAP, Amidase, SH3b) was highly conserved, whereas tail fiber/RBP genes were detected in only 14 of 22 phages. Domain analysis reclassified two proteins annotated as endolysins as virion-associated peptidoglycan hydrolases, and identified two independent mechanisms, HNH endonuclease insertion and intron splitting, that interrupt lysis-module genes and confound automated annotation. Maximum likelihood analysis recovered a strongly supported, highly conserved core clade with EW and SA13 as divergent lineages. Conclusion: Lysis modules are conserved whereas host-recognition modules are variable, indicating that host recognition rather than the lytic enzyme is the principal determinant of host range and the more rational target for phage selection and engineering.

20.
arXiv (math.PR) 2026-06-18

Kemeny's constant minimization for reversible Markov chains via structure-preserving perturbations

arXiv:2510.24679v4 Announce Type: replace-cross Abstract: Kemeny's constant measures the efficiency of a Markov chain in traversing its states. We investigate whether structure-preserving perturbations to the transition probabilities of a reversible Markov chain can improve its connectivity while maintaining a fixed stationary distribution. Although the minimum achievable value for Kemeny's constant can be estimated, the required perturbations may be infeasible. We reformulate the problem as an optimization task, focusing on solution existence and efficient algorithms, with an emphasis on the problem of minimizing Kemeny's constant under sparsity constraints.

21.
arXiv (CS.CV) 2026-06-15

Vanishing Depth: Training Generalized Depth Adapters with Sinusoidal Depth Preprocessing for Pretrained RGB Encoders

Generalized metric depth understanding is critical for precise vision-guided robotics, which current state-of-the-art (SOTA) vision-encoders do not support. To address this, we propose a self-supervised training approach that extends pretrained RGB encoders with a depth adapter to incorporate and align metric depth into a combined latent space without interfering with the pretrained RGB feature extraction. In combination with our sinusoidal depth encoding, the depth adapter enables generalized and robust depth density and distribution invariant feature extraction. Our depth adapters improve a wide set of generalized RGB baselines across a spectrum of relevant RGBD downstream tasks in segmentation, pose estimation, and depth completion – without the necessity of finetuning. Most importantly, we achieve 56.05 mIoU in the SUN-RGBD segmentation, while outperforming SOTA depth-aware and multi-modal encoders in our experiments. When no depth is present, one can activate our depth adapter with an empty map, use single pixel depth clues, or monocular depth estimation to include the depth aware feature extraction into subsequent downstream tasks.

22.
arXiv (CS.AI) 2026-06-12

Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

arXiv:2604.24806v2 Announce Type: replace-cross Abstract: Modern Deep Learning Recommendation Models (DLRMs) follow scaling laws with sequence length, driving the frontier toward ultra-long User Interaction History (UIH). However, the industry-standard "Fat Row" paradigm, which pre-materializes these sequences into every training example, creates a storage and I/O wall where data infrastructure usage exceeds GPU training capacity due to data redundancy that is amplified in multi-tenant environments where models with vastly different sequence length requirements share a union dataset. We present a versioned late materialization paradigm that eliminates this redundancy by storing UIH once in a normalized, immutable tier and reconstructing sequences just-in-time during training via lightweight versioned pointers. The system ensures Online-to-Offline (O2O) consistency through a bifurcated protocol that prevents future leakage across both streaming and batch training, while a read-optimized immutable storage layer provides multi-dimensional projection pushdown for heterogeneous model tenants. Disaggregated data preprocessing with pipelined I/O prefetching and data-affinity optimizations masks the latency of training-time sequence reconstruction, keeping training throughput compute-bound by GPUs. Deployed on production DLRMs, the system reduces training data infrastructure resource usage while enabling aggressive sequence length scaling that delivers significant model quality gains, serving as the foundational data infrastructure for modern recommendation model architectures, including HSTU and ULTRA-HSTU.

23.
arXiv (quant-ph) 2026-06-11

Nonlocal continuous-variable gates by amplified optical connections

arXiv:2603.12866v2 Announce Type: replace Abstract: Nonlocal quantum gates, coupling quantum systems located at a distance, are crucial for distributed quantum computing. To this aim, high-capacity optical noiseless connections between different processing units are essential for transmitting large amounts of information per mode. Simultaneously, optical quantum computing offers future high-speed multimode quantum processors. We propose a library of feasible protocols to implement a necessary nonlocal continuous-variable (CV) quantum nondemolition (QND) gate between two distant users sharing a quantum channel and exploiting classical communication. The users are endowed with a newly achieved high-fidelity and large-bandwith element - single-pass phase-sensitive optical parametric amplifier (OPA), that allows for both online squeezing and channel-loss compensation. The use of OPAs enhances quality of the resulting gate in terms of both excess noise and entangling capability. The proposed schemes are also applicable to CV cluster state fusion, providing a first step towards development of distributed CV measurement-based quantum computation.

24.
arXiv (CS.CV) 2026-06-11

AGE-MIL: Anchor-Guided Evidence Learning for Patient-Level Prediction

Existing computational pathology methods predominantly operate within whole-slide image (WSI)-level multiple instance learning (MIL) paradigms, while patient-level modeling remains underexplored. In routine pathological practice, however, pathologists derive diagnostic and prognostic conclusions by integrating evidence across multiple WSIs rather than relying on any single slide. This discrepancy creates a fundamental misalignment when patient-level supervision is directly imposed on conventional MIL frameworks, often leading to unstable optimization and degraded predictive reliability. To address this issue, we propose Anchor-Guided Evidence MIL (AGE-MIL), a weakly supervised framework for patient-level prediction. AGE-MIL constructs a patient-level anchor from slide representations to capture global pathological context and guide the retrieval and integration of diagnostically relevant local patches, enabling robust patient-level modeling. Patient-level risk is further modeled as an evidence accumulation process, promoting stable optimization under weak supervision. AGE-MIL is evaluated on six clinically relevant patient-level prediction tasks from two independent cohorts. Experimental results show that the proposed framework consistently outperforms eight state-of-the-art MIL methods. Code is available at https://github.com/wodeniua/AGE-MIL.

25.
arXiv (CS.LG) 2026-06-19

Prior-Informed Flow Matching for Graph Reconstruction

arXiv:2601.22107v2 Announce Type: replace Abstract: We introduce Prior-Informed Flow Matching (PIFM), a conditional flow model for graph reconstruction. Reconstructing graphs from partial observations remains a key challenge; classical embedding methods often lack global consistency, while modern generative models struggle to incorporate structural priors. PIFM bridges this gap by integrating embedding-based priors with continuous-time flow matching. Grounded in a permutation equivariant version of the distortion-perception theory, our method first uses a prior, such as GraphSAGE or node2vec, to form an informed initial estimate of the adjacency matrix based on local information. It then applies rectified flow matching to refine this estimate, transporting it toward the true distribution of clean graphs and learning a global coupling. Experiments on different datasets demonstrate that PIFM consistently enhances classical embeddings, outperforming them and state-of-the-art generative baselines in reconstruction accuracy.