Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-25

Speculative Decoding at Temperature Zero: A Scoped Safety-Invariance Screen with a 48,072-Sample Expansion

arXiv:2606.25097v1 Announce Type: new Abstract: Speculative decoding accelerates inference by letting a draft model propose tokens for a target model to verify, raising a concrete safety question: at temperature zero, can draft-side behavior leak into safety-scored outputs? We answer with Typical-Acceptance Invariance Screen (TAIS), a behavioral-equivalence screen that pairs target-only and speculative outputs on the same safety battery and requires byte-identity evidence, TOST equivalence at +/-3pp, and per-task Cohen's h below a calibrated null cutoff of |h| < 0.1. Applied to a 16,783-sample confirmatory core plus 44,066 matched expansion samples (fp16/bf16 execution, canonical and DPO-adversarial drafts, GPTQ-4bit drafts, two seeds, and four safety benchmarks), the tested temperature-zero vLLM stacks show no detectable safety divergence under TAIS. The largest absolute Cohen's h on matched target-only versus speculative refusal is 0.024, roughly an order of magnitude below the conventional trivial-effect floor; 25 of 27 per-task TOST contrasts pass at the +/-3pp margin (the two non-pass contrasts are capability-domain Wald-CI edge cases at identical ceiling rates, not genuine non-equivalence); the DPO-adversarial draft produces byte-identical output to the canonical draft across 4,006 samples; and bf16 changes 36%-53% of output bytes without moving any per-task safety rate outside equivalence. A separate 4,006-sample 70B production-scale probe, which lacks a matched 70B target-only arm and is therefore not counted as a TAIS pass, produces AdvBench refusal 0.839 over 700 AdvBench completions with 95% Wilson CI [0.809, 0.864]. We make no claim about sampling temperatures, untested frameworks, untested model families, or tree-speculation variants such as EAGLE and Medusa.

02.
arXiv (CS.LG) 2026-06-11

SpAArSIST: Sparsified AASIST for Efficient and Reliable Anti-Spoofing

arXiv:2606.11674v1 Announce Type: cross Abstract: We present SpAArSIST, a deployment-oriented refinement of the widely used AASIST graph pooling backend for self-supervised learning (SSL) based anti-spoofing. Motivated by redundant operations in public implementations, we replace learned pooling and stack-node attention with explicit, lightweight choices: separate train and inference graph pooling ratios $(k_{\mathrm{tr}},k_{\mathrm{inf}})$, magnitude-based node scoring, and mean aggregation of graph nodes. The best overall configuration (rank 1) cuts backend compute by 20.7% (195.045M $\rightarrow$ 154.706M MACs) and model size by 4.1% (611.8k $\rightarrow$ 586.4k params), while improving out-of-domain robustness on In-the-Wild to 2.82% EER and 0.078 minDCF (from 4.64% and 0.133) and remaining competitive on ASVspoof5. We further provide a composite selection score that summarizes accuracy, calibration, and compute to support balanced deployment-oriented model choice.

03.
arXiv (CS.CL) 2026-06-16

When the Same Musical Knowledge Forgets Differently: A Clean Probe of Pathway-Dependent Forgetting

A model can learn that the piano piece Für Elise is calm and reflective by listening to the audio or by reading a text description, but does it matter which route that knowledge took when it is later at risk of being forgotten? Forgetting research in multimodal models measures what knowledge is lost under adaptation, yet has not asked whether acquisition route affects how easily that knowledge is forgotten. We call this untested premise the Pathway-Invariant Assumption. Music understanding enables a clean test because a music clip and a canonical text description can be aligned to the same perceptual content, allowing the same knowledge unit to enter a model through listening or reading while the target remains fixed. Across multiple architecturally distinct audio-language models, we observe a consistent asymmetry: text-pathway knowledge is forgotten more than matched audio-pathway knowledge under identical adaptation pressure. To attribute this effect to route rather than confounds, we introduce the Paired Pathway Controlled Protocol (PPCP), a three-phase design that establishes matched pathway baselines, activates both pathways under symmetric supervision on the same knowledge pool, and applies identical forgetting pressure to both pathways. The gap is stable across models and gain-controlled analyses, persists when contradictory overwrite is replaced by correct-label cross-domain learning, remains under single-modality pressure, and is not removed by lightweight replay. Two independent routing-depth controls confirm that the effect is not explained by architectural depth, pointing to input representation as the dominant factor. Under PPCP, our results demonstrate that forgetting is highly route-dependent, establishing acquisition route as a new analytical dimension for forgetting research and multimodal system design.

04.
arXiv (CS.LG) 2026-06-11

Physically Constrained Ensemble Gaussian Process Modelling for Expensive Quantum Systems with Heteroskedastic Noise

arXiv:2606.11240v1 Announce Type: cross Abstract: Accurate modeling of quantum many-body systems often requires computationally expensive simulations such as Density Matrix Renormalization Group (DMRG) or Quantum Monte Carlo (QMC) calculations. These methods, while precise, impose significant time and resource constraints, limiting their use in exhaustive parameter exploration. Moreover, these expensive simulations can contain variable errors over the large unknown parameter space, which needs to be quantified and propagated. Thus, predictive modelling is required to estimate the functional space accurately over scarcely sampled data with heteroskedastic noise, while preserving the physical relevance of the estimation. Therefore, we present a Physically Constrained Ensemble Gaussian Process (pc-EGP) framework designed to efficiently model complex and noisy quantum systems under physical consistency constraints. The proposed method first enforces physical constraints as a user controlled weighted penalty to the data-driven loss function of the Gaussian Process (GP) surrogates. Then an ensemble of such GP models is trained with variable noisy simulations via numerical quadrature method where these multiple GP(s) at different nodes is integrated as a quadrature weighted average. We first demonstrate the framework on synthetically generated data before applying to quantum systems. In the first case study, we leverage DMRG simulations of the Bose-Hubbard Model to predict the critical interaction parameter Uc governing the superfluid-to-Mott-insulator transition. In the second case study, we demonstrate our method on QMC simulations, of a quantum liquid confined inside a nanoporous silicate with the goal of optimizing a chemical environment to realize a one-dimensional superfluid. Compared to conventional GP, pc-EGP achieves a better balance of accuracy and physically meaningful predictions.

05.
arXiv (quant-ph) 2026-06-11

Fun with Graph States: Nonlocal Bell Pairs and the Arf Invariant

arXiv:2606.06582v2 Announce Type: replace Abstract: We study inner products and partial amplitudes of graph states–a commonly employed class of quantum states, which are specified by graphs. We find that the magnitudes of these quantities are simply related to the rank of the adjacency matrix of the graph over F_2 while the phase is determined by the Arf invariant of its quadratic refinement. These facts motivate a nonlocal tensor factorization of the Hilbert space, with respect to which all graph states are products of Bell pairs with unentangled ancillae. These results may illuminate the quantum advantage in the framework of Measurement-Based Quantum Computation and suggest that graph states can be usefully visualized in the language of algebraic topology. In addition, we develop a specialized technique for computing expectation values of qubit-wise permutations in graph states, which is useful for calculating multi-invariants.

06.
arXiv (CS.LG) 2026-06-18

Optimal scenario design for climate emulation

arXiv:2606.19302v1 Announce Type: cross Abstract: As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenarios commonly used to generate training data places a ceiling on predictive skill. Here, we examine whether training datasets themselves can be optimized to improve generalization. We introduce a method to create datasets that produce emulators capable of generalizing to new, structurally different scenarios absent from the training data. We use a differentiable Simple Climate Model (SCM) to calculate the sensitivity of emulator loss to perturbations in the training data, iteratively updating the training data to maximize emulator skill. For an SCM, training on one scenario optimized in this fashion outperforms an emulator trained on six standard ScenarioMIP pathways. We achieve this higher predictive skill despite training on a smaller dataset, finding that our emulator successfully isolates distinct physical behaviors of different climate forcing agents (e.g., greenhouse gases vs. aerosols) without single-forcing runs. We then demonstrate that scenarios optimized using an SCM, when used to drive an intermediate-complexity climate model, produce a training dataset that yields a more skillful emulator than training on ScenarioMIP outputs. Our results suggest that, in the compute-constrained environment of running full-scale climate models, generating a small number of dynamically rich scenarios provides greater marginal value for emulation and characterizing system responses than expanding the suite of traditional emissions pathways.

07.
arXiv (CS.AI) 2026-06-16

AIRMap: AI-Generated Radio Maps for Wireless Digital Twins

arXiv:2511.05522v4 Announce Type: replace-cross Abstract: Accurate, low-latency channel modeling is essential for real-time wireless network simulation and digital-twin applications. Traditional modeling methods like ray tracing are however computationally demanding and unsuited to model dynamic conditions. In this paper, we propose AIRMap, a deep-learning framework for ultra-fast radio-map estimation, along with an automated pipeline for creating the largest radio-map dataset to date. AIRMap uses a single-input U-Net autoencoder that processes only a 2D elevation map of terrain and building heights. Trained on 1.2M Boston-area samples and validated across four distinct urban and rural environments with varying terrain and building density, AIRMap predicts path gain with under 4 dB RMSE in 4 ms per inference on an NVIDIA L40S-over 100x faster than GPU-accelerated ray tracing based radio maps. A lightweight calibration using just 20% of field measurements reduces the median error to approximately 5%, significantly outperforming traditional simulators, which exceed 50% error. Integration into the Colosseum emulator and the Sionna SYS platform demonstrate near-zero error in spectral efficiency and block-error rate compared to measurement-based channels. These findings validate AIRMap's potential for scalable, accurate, and real-time radio map estimation in wireless digital twins.

08.
arXiv (math.PR) 2026-06-16

An Algebraic Matrix Spencer Theorem

arXiv:2606.16005v1 Announce Type: new Abstract: We develop an algebraic approach to matrix discrepancy based on the representation theory of finite-dimensional C$^*$-algebras. As an application, we resolve a substantial structured special case of the Matrix Spencer conjecture. In particular, we show that for every family of contractions $A_1,\ldots,A_n$ that are contained in a finite-dimensional $C^*$-algebra $\mathcal A$ with $dim_{\mathbb C} (\mathcal A) \lesssim n$, there exists signs $x\in\{\pm1\}^n$ such that $\|\sum_{i=1}^n x_i A_i\| \le O(\sqrt n)$. As a noteworthy special case, our main result also resolves the Group Spencer conjecture of (Bandeira'24). We furthermore prove that Matrix Spencer continues to hold for low-rank perturbations of matrix families coming from an $C^*$-algebra of small dimension.

09.
arXiv (CS.LG) 2026-06-19

Neural Architectures as Functional Priors in Physics-Informed Control Problems

arXiv:2606.19368v1 Announce Type: cross Abstract: In this work we investigate the role of neural architectures as implicit functional priors in control problems governed by ordinary differential equations. Rather than focusing on highly complex problems, our objective is to investigate architecture-dependent effects in controlled dynamical systems within the simplest physically interpretable settings possible. In particular, we study a controlled linear RLC electrical circuit and a nonlinear Duffing-type dynamical system. Both systems are analyzed first through classical optimal-control formulations and later through PINN-based approaches. We compare different combinations of multilayer perceptrons (MLPs) and Fourier-based KAN-like architectures, and analyze their influence on the resulting controls. The numerical experiments suggest that different architectural choices systematically generate qualitatively distinct controls, even under identical governing equations, loss functionals, initial and target states, training parameters and physical constraints. Significant differences appear in the spectral structure, smoothness, energy distribution, and phase-space behavior of the learned solutions. A central observation of this work is the emergence of a functional specialization phenomenon when the neural architectures are allowed sufficient freedom to shape the structure of the learned controls. More specifically, in the systems considered here, Fourier-based architectures tend to produce trajectories with richer oscillatory content, whereas smoother low-frequency-biased architectures tend to generate more regular and energetically efficient controls. This suggests that different functional components of the control problem may be handled more efficiently by different neural architectures, leading to an implicit specialization between state representation and control generation.

10.
arXiv (CS.AI) 2026-06-25

PVF:Understanding AI Vulnerability Against SDCs

arXiv:2405.01741v4 Announce Type: replace-cross Abstract: Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference – recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF has been a critical metric used for making key error management design decisions in productionizing Meta's in-house AI chip - MTIA.

11.
arXiv (CS.AI) 2026-06-24

Large-Language-Model Discovery of Quantum LDPC Codes through Structured Concept Evolution

arXiv:2606.24808v1 Announce Type: cross Abstract: Quantum computers could outperform classical machines on important problems, but only if the errors that pervade quantum hardware can be corrected at scale. Quantum low-density parity-check (qLDPC) codes offer a promising route to this goal by combining sparse parity checks with finite encoding rate and growing distance, but their construction remains a challenging discrete design problem. Here we introduce structured concept evolution (SCE), a search framework that pairs a large language model with a structured algebraic mutation grammar to discover lifted-product code families, a class of CSS qLDPC codes. Instead of asking the LLM to design codes from first principles, SCE evolves structured concepts consisting of algebraic specifications paired with executable programs that realize them, using hierarchical mutations that modify the group algebra, protograph geometry, or base space. Running SCE, we discover a diverse set of competitive code families, ranging from abelian constructions to families over non-abelian groups beyond those underlying standard designs such as bivariate-bicycle codes, and characterize them under code-capacity depolarizing noise with BP+OSD decoding. These results are obtained with lightweight models (GPT-5.4-mini and GPT-5.4-nano).

12.
arXiv (quant-ph) 2026-06-25

Weak decay of the positronium ion

arXiv:2606.25433v1 Announce Type: cross Abstract: The positronium ion ($\mathrm{Ps}^-$), a coulombic three-body bound state of two electrons and a positron, predominantly decays via electron-positron annihilation into electromagnetic final states. While its radiative decay channels have been extensively studied, much less attention has been given to weak processes in this system. In this work, we investigate the rare decay $\mathrm{Ps}^- \to e^- \nu_\mu \bar{\nu}_\mu$, obtained by replacing the photon in $\mathrm{Ps}^- \to e^- \gamma$ with a virtual $Z$ boson. Treating the three-body process as an effective two-body transition, $\mathrm{Ps}^- \to e^- Z^*\left(\to \nu_\mu\bar{\nu}_\mu\right)$, we compute the decay rate by explicitly evaluating all spin configurations of the initial bound state and final particles. The result agrees with that obtained using the standard spin-summation formalism of quantum field theory. We find that the branching ratio is comparable to that of the weak decay of ortho-positronium, $\mathrm{o-Ps} \to \gamma \nu \bar{\nu}$.

13.
arXiv (CS.CV) 2026-06-15

Generation of Maximal Snake Polyominoes Using a Deep Neural Network

Maximal snake polyominoes are difficult to study numerically in large rectangles, as computing them requires the complete enumeration of all snakes for a specific rectangle size, which corresponds to a brute force algorithm. This hinders the study of maximal snakes in larger rectangles. Moreover, most enumerable snakes lie in small rectangles, obscuring large-scale patterns. In this paper, we investigate the contribution of a deep neural network to the generation of maximal snake polyominoes from a data-driven training, where the maximality and adjacency constraints are not encoded explicitly, but learned. To this extent, we experiment with a denoising diffusion model, which we referred as Structured Pixel Space Diffusion (SPS Diffusion). We find that SPS Diffusion generalizes from small rectangles to larger ones, generating valid snakes up to 28x28 squares and producing maximal snake candidates on squares close to the current computational limit. The model is, however, prone to errors such as branching, cycles, or multiple snake components. Overall, the diffusion model is promising and suggests that complex combinatorial objects can be understood by deep neural networks, which is useful in their investigation.

14.
arXiv (CS.AI) 2026-06-17

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

arXiv:2510.01359v2 Announce Type: replace-cross Abstract: Code-capable large language model (LLM) agents are embedded in software engineering workflows where they can read, write, and execute code, raising "jailbreak" stakes beyond text-only settings. Prior evaluations emphasize refusal or harmful-text detection, leaving open whether agents compile and run malicious programs. We present JAWS-Bench (Jailbreaks Across WorkSpaces), a benchmark spanning three escalating workspace regimes mirroring attacker capability: empty (JAWS-0), single-file (JAWS-1), and multi-file (JAWS-M). We pair this with a hierarchical, executable-aware Judge Framework that tests (i) compliance, (ii) attack success, (iii) syntactic correctness, and (iv) runtime executability, to measure deployable harm. Across seven LLM backends from five families, prompt-only attacks in JAWS-0 achieve 61% compliance; 58% are harmful, 52% parse, and 27% run end-to-end. In JAWS-1, compliance reaches ~100% for stronger models with a mean ASR (Attack Success Rate) ~71%; JAWS-M raises mean ASR to ~75%, with 32% runnable attack code. Wrapping an LLM in an agent increases ASR by 1.6$\times$, by overturning initial refusals during planning and tool use. Similar trends hold for OpenHands, SWE-Agent, and OpenAI Codex, suggesting our JAWS-Bench is agent-agnostic. Category analyses identify which attack classes are most vulnerable and deployable, motivating execution-aware defenses and refusal-preserving agent designs.

15.
arXiv (CS.AI) 2026-06-25

TRUSTMEM: Learning Trustworthy Memory Consolidation for LLM Agents with Long-Term Memory

arXiv:2606.25161v1 Announce Type: new Abstract: Large language model (LLM) agents rely on long-term memory to support extended interactions and personalized assistance beyond finite context windows. Existing memory agents actively update external memory through generated write, revise, and delete operations, but these updates may omit important information, corrupt existing memory, or introduce unsupported hallucinated content. Once stored, such errors become persistent system-state failures that can affect future reasoning and generation. In this paper, we propose TrustMem, a framework designed to improve the trustworthiness of memory consolidation. TrustMem relies on a Memory Transition Verifier to evaluate the transition process of memory updates in terms of coverage, preservation, and faithfulness. It further constructs preference pairs among candidate updates under the same memory state, enabling preference-guided reinforcement learning to directly optimize memory updating behaviors. Extensive experiments demonstrate that TrustMem improves both memory utility and reliability: it achieves state-of-the-art results across MemoryAgentBench, HaluMem, and the Mem-alpha validation set, improves HaluMem memory extraction by 12.14 F1 points, and reduces transition-level omission, corruption, and hallucination by 40.1\%, 79.1\%, and 50.0\%, respectively, compared with the strongest baseline for each error type.

16.
arXiv (CS.CV) 2026-06-25

EchoStyle: Unlocking High-Fidelity Video Stylization with Reverse Data Synthesis

While image stylization has been studied extensively, video stylization remains a critical and largely unsolved challenge in the field of intelligent content creation. Existing methods, usually utilizing a reference image as the style prior, suffer from content leakage, data scarcity and limited adaptability to long videos, leading to suboptimal results with severe style drift and motion distortion. For these issues, we present EchoStyle, a scalable text-driven framework to achieve high-quality stylization of videos with arbitrary lengths. To start with, we construct a video-to-video architecture to appropriately re-fuse the video content and the text style. To address data scarcity, we pioneer an automatic reverse-synthesis pipeline to establish V-Style20k, a large-scale stylization dataset of 20k high-quality video pairs. To facilitate long video stylization, we devise an init-follow-mode mechanism along with a sliding-window inference strategy. Extensive experiments demonstrate EchoStyle's excellent performance across a wide range of artistic styles, even comparable to leading closed-source solutions.

17.
arXiv (CS.LG) 2026-06-24

Separating Oblivious and Adaptive Models of Variable Selection

arXiv:2602.16568v2 Announce Type: replace-cross Abstract: Sparse recovery is among the most well-studied problems in learning theory and high-dimensional statistics. In this work, we investigate the statistical and computational landscapes of sparse recovery with $\ell_\infty$ error guarantees. This variant of the problem is motivated by variable selection tasks, where the goal is to estimate the support of a $k$-sparse signal in $\mathbb{R}^d$. Our main contribution is a provable separation between the oblivious (``for each'') and adaptive (``for all'') models of $\ell_\infty$ sparse recovery. We show that under an oblivious model, the optimal $\ell_\infty$ error is attainable in near-linear time with $\approx k\log d$ samples, whereas in an adaptive model, $\gtrsim k^2$ samples are necessary for any algorithm to achieve this bound. This establishes a surprising contrast with the standard $\ell_2$ setting, where $\approx k \log d$ samples suffice even for adaptive sparse recovery. We conclude with a preliminary examination of a partially-adaptive model, where we show nontrivial variable selection guarantees are possible with $\approx k\log d$ measurements.

18.
arXiv (quant-ph) 2026-06-17

Breaking the bicycle frame: Coset-based quantum LDPC codes

arXiv:2606.17268v1 Announce Type: new Abstract: Generalizing the construction of two-block group algebra (2BGA) codes, we introduce a family of two-block quantum LDPC codes constructed using the action of a group on the cosets of its subgroup. This replaces the regular group actions of the earlier two-block constructions and significantly expands the search space, yielding new quantum LDPC codes outside the 2BGA family. Through a computer search, we identify several new quantum LDPC codes, including weight-6 codes with parameters $[[48,8,6]]$, $[[96,8,10]]$, and $[[224,12,16]]$, as well as weight-8 codes with parameters $[[84,16,8]]$, $[[112,16,10]]$, $[[128,16,12]]$, and $[[168,16,15]]$. Furthermore, we introduce a maximally packed syndrome extraction schedule of depth $w+2$, including initialization and measurement steps, for any code with a maximum stabilizer weight of $w$ from our family. Under a standard circuit-level noise model, our codes, when decoded using BP-OSD, perform competitively with BB codes, achieving thresholds of $\approx0.65\%$ for the weight-6 family and $\approx0.35\%$ for the weight-8 family. Finally, we introduce a group-theoretic framework to generate sequences of graph-based covers of 2BGA codes, recovering and extending recent results on code constructions of this type.

19.
arXiv (CS.AI) 2026-06-15

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

arXiv:2603.10444v2 Announce Type: replace-cross Abstract: FP4 training promises substantial memory and compute savings for large language models, but remains fragile because blockwise quantization is dictated by extreme activation magnitudes, which inflate dynamic range and compress long-tail signals. We identify a counterintuitive source of this failure: dominant activation outliers are not merely arbitrary sparse events, but are largely induced by a coherent rank-one mean bias, whose direction aligns with the leading anisotropic spectral component. This mean component strengthens during training, is amplified and reshaped by attention and FFN operators, and increasingly dominates top activation magnitudes. Crucially, this discovery reveals that a seemingly complex outlier-suppression problem admits a truly simple solution: isolate the coherent mean before quantization. We therefore propose Averis, a mean-residual splitting quantization method that separates the mean component using only reductions and elementwise subtractions before FP4 quantization. Across Qwen3 0.6B Dense trained on 100B tokens and Qwen3 7B A1.5B MoE trained on 50B tokens, Averis enables robust W4A4G4 FP4 training, reducing BF16 loss gaps to 1.19%/0.81% versus 2.05%/1.10% for NVIDIA's recently released Hadamard-based outlier-smoothing method, while limiting downstream gaps to 0.89/0.71 points. With only 2.20% end-to-end overhead over vanilla NVFP4, about 30% of NVIDIA's Hadamard-based design, Averis provides a hardware-efficient path to stable low-bit LLM training. Complementary to Hadamard, Averis further reduces the Qwen3-0.6B loss and downstream gaps to 0.94% and 0.73 points when combined. Code is available at: https://anonymous.4open.science/r/averis-504D.

20.
arXiv (quant-ph) 2026-06-16

Optimizing resource bounds in direct fidelity estimation

arXiv:2606.16336v1 Announce Type: new Abstract: Direct fidelity estimation provides a way to estimate the fidelity between an experimentally prepared state and a desired pure target state without performing full tomography. Two influential formulations were introduced in 2011 by Flammia and Liu and by da Silva, Landon-Cardinal, and Poulin. In these protocols, the total estimation error is controlled through two distinct probabilistic steps: first, the fidelity is approximated using randomly sampled Pauli observables; second, each sampled expectation value is estimated from finitely many measurement outcomes. In this work we show that additional structural information about the noise can substantially sharpen the corresponding resource bounds. In particular, for some canonical channels the effective number of sampled Pauli settings can be reduced, leading to lower measurement cost both in the general pure-state setting and in the case of a stabilizer state. These results illustrate a broader point: worst-case confidence bounds in direct fidelity estimation can be significantly conservative when experimentally relevant structure is ignored. As a technical ingredient, we also revisit the allocation of the total accuracy and confidence budgets between the two probabilistic steps. Reformulating the analysis in terms of separate error parameters yields a constrained optimization problem whose solution lowers the average number of measurements in the general pure-state setting. Numerical simulations based on quantum circuits implemented in Qiskit illustrate both the improvement obtained under structured-noise assumptions and the conservativeness of the original worst-case bounds.

21.
arXiv (CS.CL) 2026-06-24

VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to link speech with speaker identities, restricting data collection to recordings where speakers appear on camera. We propose a face-independent dataset construction pipeline and introduce VieSpeaker, a large-scale Vietnamese speaker recognition dataset. Our approach leverages textual metadata and large language model reasoning to infer speaker identities from transcripts and contextual information. VieSpeaker contains approximately 902 hours of speech from 4,715 speakers. Experiments show that models trained on VieSpeaker achieve improved robustness and generalization compared to existing Vietnamese datasets. This work demonstrates the feasibility of face-independent dataset construction and provides a new direction for building large-scale speech resources.

22.
arXiv (CS.CV) 2026-06-15

Temporal Backtracking Search for Test-time Generative Video Reasoning

While test-time scaling has revolutionized reasoning in large language models, generative video reasoning remains bottlenecked by a single-shot paradigm. We demonstrate that searching over denoising steps cannot rescue logically flawed rollouts because spatial trajectories commit early in the diffusion process. Root-level Best-of-N (BoN) sampling is similarly inefficient: reasoning errors cluster early in the temporal axis, and resampling blindly discards verified upstream progress. To unlock effective test-time scaling for video models, we introduce Temporal Backtracking Search (TBS), which shifts the search space to the temporal axis. TBS transforms video generation into an iterative generate-verify-restart loop via three core mechanisms: (1) variable-K conditioning to resume generation from arbitrary clean prefixes; (2) temporal process verification to localize failures and extract valid restart anchors; and (3) prefix-based search to reallocate compute toward extending correct trajectories rather than root resampling. Across algorithmic, navigation, and robotics domains, TBS Pareto-dominates matched-budget BoN. In a strict out-of-distribution setting where one-shot generation collapses (0.7% for BoN), TBS achieves 22.7%, with every solved episode stemming from a restarted branch. Ultimately, TBS reveals that the local reasoning competence of video models far exceeds what single-shot rollouts indicate, providing a scalable test-time framework to unlock it.

23.
arXiv (math.PR) 2026-06-17

The Loss of Tension in an Infinite Membrane with Holes of Decaying Spatial Density

arXiv:2606.17792v1 Announce Type: new Abstract: What is the effect of randomly removing material from an infinite stretched membrane? Under what conditions can the membrane still sustain tension? This problem was introduced by Robert Connelly in connection with applications of rigidity theory in the natural sciences, and was later studied in M. V. Menshikov, K. A. Rybnikov, and S. E. Volkov, "The loss of tension in an infinite membrane with holes distributed according to a Poisson law" (2002); a discrete version was also considered in Robert Connelly, Konstantin Rybnikov, and Stanislav Volkov, "Percolation and the Loss of Tension in an Infinite Triangular Lattice" (2001). We study a mathematical framework based on a non-homogeneous Poisson point process whose intensity $\lambda$ tends to zero at infinity. The hole shapes are i.i.d.\ and independent of their locations. We show that if the intensity does not decay too quickly, then tension is still lost throughout the whole plane, as in the homogeneous model studied in 2002. Conversely, we give sufficient conditions under which complete loss of tension does not occur. Thus, both destruction and non-destruction regimes are possible even when the intensity tends to zero, indicating a phase transition in the model. The processes studied here are closely related to bootstrap percolation.

24.
arXiv (quant-ph) 2026-06-16

Classical Explanations in (and of) General Probabilistic Theories

arXiv:2603.05627v2 Announce Type: replace Abstract: We introduce a notion of the ``explanation" of one (generalized) probabilistic model by another as particular kind of span in the category $\Prob$ of probabilistic models and morphisms. We show that explanations compose under a standard pullback construction (notwithstanding that $\Prob$ does not support arbitrary pullbacks). We then show that every locally-finite probabilistic model has a canonical, sharp classical explanation. The construction is functorial, so every locally-finite probabilistic theory has a canonical, sharp classical (though of course, usually non-local) representation.

25.
arXiv (CS.CV) 2026-06-17

EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm – on-demand TTA – which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.