论文广场 - AcademicHub

01.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2606.17709

Optimizing bias-tailored quantum error correction beyond code-capacity noise

作者:

C\'esar Benito ↗I. Jes\'an Vel\'azquez-Res\'endiz ↗Alejandro Bermudez ↗

arXiv:2606.17709v1 Announce Type: new Abstract: We find that the substantial advantages predicted for bias-tailored quantum error correction (QEC) under code-capacity noise are strongly reduced once realistic syndrome extraction and circuit-level noise models are considered. We start by comparing XZZX codes to rectangular surface codes with a bias-dependent optimised anisotropy. Although code-capacity simulations predict an advantage of rectangular surface codes in the limit of high noise bias, this actually disappears under circuit-level noise, making the XZZX codes the preferred and simplest choice even for platforms that allow for a flexible variation of the code layout adapted to changes in noise calibration. Our results identify bias degradation during syndrome extraction under circuit-level noise as the central limitation of biased-tailored QEC. To partially mitigate this effect, we introduce a bias-filtering CNOT gadget that temporarily encodes the ancillary target qubit during syndrome extraction in a repetition code and, upon measurement and feed forward, manages to reduce the bias degradation. In a regime of high-bias and low-idle errors, this bias-filtering gadget yields a few-percent relative improvement of the XZZX code error threshold, demonstrating that lightweight bias-filtering strategies can recover part of the lost bias-tailoring advantage for realistic circuit-level noise.

阅读与讨论 → 访问原文 →

02.

arXiv (quant-ph) 2026-06-12 DOI: arXiv:2605.15233

Measuring Control-Plane Openness in Near-Term Quantum Computing: A Rubric, Its Validation, and an Application to Thirteen Vendor Stacks

作者:

Rylan Malarchick ↗

arXiv:2605.15233v2 Announce Type: replace Abstract: Public access to pulse-level and control-electronics interfaces in commercial quantum computing has bifurcated. This paper proposes a six-axis rubric for measuring control-plane openness, the layer between gate-level circuit specification and physical control electronics, defined operationally so that the same evidence produces the same grade across vendors. The rubric is validated three ways: a blinded re-grading pass, thirty-nine days after the evidence cutoff, that tests whether the cited evidence and the level definitions alone reproduce the recorded grades; a boundary-case methodology that fixes where each level begins and ends; and a published grading protocol that lets others reproduce and contest any cell. We establish that the rubric measures change rather than describing a snapshot by comparing the catalog against the documented control plane before the February 2025 removal of pulse-level access from IBM hardware, and reporting the cells that moved. The rubric is applied to thirteen commercial vendors across superconducting, trapped-ion, neutral-atom, and photonic modalities as of May 1, 2026, as its first application, and one of the three harms the rubric is designed to detect is demonstrated through a reproduction-access audit of five pre-2025 IBM Qiskit Pulse experiments against the access available on current hardware, carried through to a client-side structural port of the audit's selected target to Rigetti Quil-T. The catalog ships as a separate machine-readable artifact under CC-BY-4.0 with per-cell source URLs (https://doi.org/10.5281/zenodo.20163276). The catalog readings will change as vendor policies shift; the rubric is the contribution that survives them.

阅读与讨论 → 访问原文 →

03.

bioRxiv (Bioinfo) 2026-06-21 DOI: HASH:af9e3a89120a56cc1089298d78685ab4

ReSeT: a taxonomy-aware reference genome selection tool

作者:

van Bemmelen ↗Baaijens ↗J. A ↗

Motivation: Reference genome composition determines which taxa a profiling pipeline can detect and distinguish, and becomes of critical importance for high-resolution profiling where taxonomic boundaries begin to blur. Existing selection tools optimize within-taxon representativeness but disregard discrimination across taxa, leaving open whether explicitly accounting for inter-taxon discrimination during selection improves profiling. Results: Here we present ReSeT, a facility-location-based reference genome selection tool that operates on arbitrary pairwise distance matrices, extended with a tunable inter-taxon discrimination term and per-genome selection cost, and solved by local search. We benchmark ReSeT against established selection methods on three viral datasets spanning varying degrees of taxonomic ambiguity. On the high-ambiguity SARS-CoV-2 datasets, appropriately tuned ReSeT selections matched or exceeded the strongest alternatives in terms of profiling accuracy, whereas on the low ambiguity IAV dataset VSEARCH remained dominant. Interestingly, we find that the novel inter-taxon discrimination term contributed weakly, indicating that ReSeT's facility-location formulation and selection cost drives ReSeT's performance. We further propose a novel taxonomic ambiguity index, computable from ReSeT's inputs, that summarizes the taxonomic ambiguity of reference genomes and aligns with where ReSeT improves over existing selection methods. Availability and implementation: ReSeT is implemented in Python ([≥]3.10) and is freely available under the MIT license. The source code is available on GitHub at https://github.com/JaspervB-tud/ReSeT and ReSeT can also be installed directly from the Python Package Index (PyPI) via pip install reset-bio.

阅读与讨论 → 访问原文 →

04.

PLOS Computational Biology 2026-06-10 DOI: HASH:b37c175e15f1aa7ae4c9512e0c803096

A mean-field model of neural networks with PV and SOM interneurons reveals connectivity-based mechanisms of gamma oscillations

作者:

Farzin Tahvili ↗

by Farzin Tahvili, Martin Vinck, Matteo Di Volo Classic theoretical models of cortical oscillations are based on the interactions between two populations of excitatory and inhibitory neurons. Nevertheless, experimental studies and network simulations suggest that interneuron subclasses such as parvalbumin (PV) and somatostatin (SOM) exert distinct control over oscillatory dynamics. Yet, we lack a theoretical understanding of the mechanisms underlying oscillations in E-PV-SOM circuits and of the differences with respect to the classical mechanisms for oscillations in simpler E–I networks. Here, we derive a biologically realistic mean-field model of a canonical three-population E-PV-SOM circuit. This model robustly generates oscillations whose features are consistent with experimental observations, including the relative timing of PV and SOM activity and the effects of optogenetic perturbations. By reducing the model to a linear analytical form, we demonstrate that gamma oscillations emerge directly from the cell-specific connectivity of the three-population circuit. This connectivity motif alone accounts for experimentally observed phase relationships, with PV activity consistently leading that of SOM neurons. Together, this mean field model identifies a distinct structural mechanism giving rise to oscillations in canonical E–PV–SOM circuits and provides theoretical primitives for constructing large-scale, cell-type-specific models of cortical dynamics.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17450

A Machine-Learned Comorbidity Index

作者:

Suleman Baloch ↗Kishlay Jha ↗Alberto M. Segre ↗Philip M. Polgreen ↗Bijaya Adhikari ↗

arXiv:2606.17450v1 Announce Type: new Abstract: Traditional comorbidity scores (e.g., Charlson and Elixhauser) are widely used for risk adjustment and patient stratification, but they have two key limitations: (i) they are largely mortality-centric and do not align well with other clinical outcomes, and (ii) their linear, rule-based structure cannot capture nonlinear, outcome-specific risk relationships. We propose a Machine-Learned Comorbidity Index (MLCI) that maps diagnosis codes to a single scalar by maximizing the normalized Hilbert-Schmidt Independence Criterion (nHSIC) between the learned score and multiple clinical outcomes. MLCI captures nonlinear risk-outcome dependence and is supported by a theory that characterizes when a unified, informative admission-level ordering can be achieved across outcomes. Empirical results on multiple benchmark electronic health record (EHR) datasets show that MLCI outperforms strong baselines across multiple evaluation metrics.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.20400

The Significance of Style Diversity in Annotation-Free Synthetic Data Generation

作者:

Zahra Abbasiantaeb ↗Zeno Belligoli ↗Omar Essam ↗Mohammad Aliannejadi ↗

arXiv:2606.20400v1 Announce Type: new Abstract: Generating high-utility synthetic data for intent classification typically requires human-annotated seed data, which is often unavailable in fast-paced industrial settings. In this paper, we propose a framework for synthetic dialogue generation that works entirely without human-annotated data, relying solely on intent definitions. Our proposed dialogue generation framework utilizes two different types of topic and style attributes to improve data diversity. Also, we propose two novel post-hoc stylization models called Univ and Exam to transform synthetic LLM-generated utterances into more varied, human-like linguistic styles. To enhance data quality, we utilize an LLM-as-a-judge filtering process. Experimental results on both industrial and public datasets demonstrate that the proposed approach achieves up to 93.3% of the performance obtained using human-annotated training data. Crucially, the findings reveal that style diversity is more critical than topic diversity for synthetic data utility, as it prevents models from learning spurious stylistic correlations. Furthermore, the study shows that incorporating style attributes during the generation process is more effective than post-hoc style adaptation.

阅读与讨论 → 访问原文 →

07.

arXiv (quant-ph) 2026-06-12 DOI: arXiv:2606.13499

Observation of Non-Gaussian Magnon Dynamics in a Two-Dimensional Long-Range XY Model

作者:

S. -A. Guo ↗J. -Y. Tan ↗J. Ye ↗Y. Jiang ↗L. Zhang ↗Y. -X. Chen ↗H. -J. Chen ↗H. -Y. Hu ↗W. -X. Guo ↗B. -X. Qi ↗L. He ↗Z. -C. Zhou ↗…

arXiv:2606.13499v1 Announce Type: new Abstract: Non-Gaussian evolution of high-order spin correlations characterizes important properties of quantum many-body systems. In practice, decoherence, statistical fluctuation and miscalibration of experimental parameters all hinder the witness of non-Gaussian dynamics. Here we demonstrate the crossover between Gaussian and non-Gaussian dynamics on a two-dimensional XY model with long-range and spatially structured interaction using a trapped ion quantum simulator. We prepare different initial densities of magnon excitations and verify the dynamics of single-spin observables for the engineered Hamiltonian. Then we compare the high-order spin correlations with the mean-field solution and the Holstein-Primakoff approximation, and demonstrate the non-Gaussian behavior in a way independent of the calibration errors. Our work provides a verifiable path from classically simulatable dynamics to regimes where quantum advantage may emerge.

阅读与讨论 → 访问原文 →

08.

bioRxiv (Bioinfo) 2026-06-21 DOI: HASH:1c360dd3539c1155235790fd20aa8a49

Expanding the GUSome: Structure-guided identification and characterization of gut microbial β-glucuronidases

作者:

Singhal ↗Badgujar ↗C. V ↗Bihani ↗S. C ↗

The gut microbiome-encoded {beta}-glucuronidase (GUS) enzymes have a significant effect on human physiology through their deglucuronidation activity on endogenous and exogenous glucuronides. GUS activity also significantly influences the pharmacokinetics, efficacy and toxicity of various drugs including chemotherapeutic drugs. Given their crucial role in drug metabolism, GUS enzymes have emerged as promising targets for therapeutic intervention. Here, we have identified and characterized 79 unique GUS enzymes through a structure-guided approach. Structural modelling of these GUS enzymes revealed a conserved core and active-site residues with significant variations in the number and nature of the C-terminal domains. A new classification system based on the number and type of additional C-terminal domains is presented for the GUS proteins. Further, GUS enzymes have been categorized into different loop categories linked to their substrate preferences. The relationship between domain architecture and loop-type is explored by sequence similarity network analysis. We could successfully express, purify and validate GUS processing capability of a panel of identified GUS proteins. The nature of oligomer organization has been deciphered by SEC and DLS studies. Further, we have identified additional GUS enzymes capable of processing SN-38G, glucuronidated form of anticancer drug, irinotecan. These newly identified GUS enzymes will offer valuable insights into gut microbial GUS diversity and their role in understanding the population-specific drug-induced adverse effects on human health.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.11836

Towards Data-free and Training-free Compression for Speech Foundation Models Using Parameter Clustering

作者:

Haoning Xu ↗Zhaoqing Li ↗Huimeng Wang ↗Youjun Chen ↗Chengxi Deng ↗Mengzhe Geng ↗Xunying Liu ↗

arXiv:2606.11836v1 Announce Type: cross Abstract: This paper presents a novel data-free and training-free compression approach for speech foundation models using channelwise clustering via k-means. More fine-grained, mixed sparsity pruning by layer-level varying number of parameter clusters is also explored. Experiments conducted on the LibriSpeech dataset suggest that when operating with pruning sparsity of 50% on HuBERT-large, consistent WER reductions of 27.73%/18.61% absolute (34.37%/21.91% relative) over the magnitude-based pruning were obtained on the test-clean and test-other subsets before fine-tuning and 0.19%/0.79% absolute (3.36%/4.62% relative) after fine-tuning with only 3 epochs. Similar WER reductions of 2.86%/5.02% absolute (59.21%/55.29% relative) were observed against magnitudebased pruning on Whisper-large-v3 at 10% sparsity, all with no significant WER increase relative to the uncompressed baseline.

阅读与讨论 → 访问原文 →

10.

arXiv (math.PR) 2026-06-16 DOI: arXiv:2606.16237

The Backward Stochastic Partial Differential Integral Equations: Solvability and Comparison Principle

作者:

Qingxin Meng ↗Qi Zhang ↗

arXiv:2606.16237v1 Announce Type: new Abstract: The paper is concerned with the well-posedness of backward stochastic partial differential equations with jumps, also called backward stochastic partial differential integral equations. We start from the proof for the existence and uniqueness of solution to backward stochastic evolution equation with jump in the Gelfand triple framework. Then the well-posedness of both weak solution and strong solution to backward stochastic partial differential integral equation is obtained with the Gelfand triple replaced by specific Sobolev spaces. Finally, the comparison principle for backward stochastic partial differential integral equation is proved, which has potential applications in financial mathematics.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2601.06862

Learning QoE from Packet-Level Measurements in Encrypted Video Conferencing Traffic

作者:

Michael Sidorov ↗Ofer Hadar ↗

The quality of the user experience has become one of the most important aspects in todays world, as it directly influences individuals willingness to continue using or abandon a product or service. In this context, video conferencing applications (VCAs), which experienced widespread adoption following the COVID-19 pandemic, must deliver excellent performance to remain competitive in an increasingly crowded market. Although content providers (CPs) such as Zoom, WhatsApp, Telegram, and Google Meet can assess conversation quality by comparing transmitted and received data. The widespread use of end-to-end encryption in VCAs makes quality-of-experience (QoE) evaluation by internet service providers (ISPs) far more challenging. Since ISPs do not have access to the encrypted content, they must rely on passive measurements of unencrypted traffic characteristics on the data path. In this work, we present a simple yet effective QoE prediction framework based on an almost stock convolutional neural network (CNN) architecture that uses only the packet sizes extracted from the communication between two participants in a video conferencing (VC) call to predict two QoE metrics: BRISQUE and MOS. The proposed framework is simple, easy to implement, and does not require high-end computational resources, yet it provides superior prediction performance, as shown in our experiments on two custom datasets collected from WhatsApp and Zoom, which achieve substantial improvements over previous models for the QoE prediction task.

阅读与讨论 → 访问原文 →

12.

Nature Medicine 2026-06-15 DOI: HASH:005d7a9dffc472da8918c78aa285d6e6

Activity-dependent adaptive deep brain stimulation improves gait in Parkinson’s disease

作者:

Stefano Scafa ↗

Parkinson’s disease leads to a spectrum of locomotor deficits that vary in severity with the nature of daily activities and the fluctuating physiology of patients. Many of these deficits remain inadequately addressed by existing deep brain stimulation therapies that rely on activity-agnostic parameters optimized for cardinal motor symptoms. By contrast, therapies embedding activity-specific parameters have the potential to better address the entire range of symptoms. Here we expose physiological principles that enable real-time decoding of ongoing locomotor activities across motor fluctuations from the neural dynamics of the subthalamic nucleus. This decoding steered activity-dependent adaptations of deep brain stimulation therapies that improved locomotor deficits while preserving efficacy for cardinal motor symptoms across activities of daily living. Our activity-dependent framework provides a blueprint for next-generation neuromodulation therapies that continuously select parameters optimized to the behavioral context and fluctuating physiology of each patient. ClinicalTrials.gov registration NCT06791902 . Neural decoding algorithms that leverage physiological principles of locomotor encoding support activity-dependent deep brain stimulation therapies that improve locomotor deficits in people with Parkinson’s disease.

阅读与讨论 → 访问原文 →

13.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2605.20930

Symmetry-Induced Relaxation Comb and Strong Quantum Mpemba Effect in Long-Range XXZ Spin Chains

作者:

Zijun Wei ↗Mingdi Xu ↗Yefeng Song ↗Yangqian Yan ↗Lei Pan ↗

arXiv:2605.20930v3 Announce Type: replace Abstract: Understanding how symmetry constrains dissipative relaxation in open quantum many-body systems remains a central challenge in nonequilibrium physics. Here we uncover a symmetry-filtered Liouvillian mechanism for fast relaxation in a long-range XXZ spin chain subject to dephasing noise. At the isotropic point, the Hamiltonian has global $SU(2)$ symmetry, whereas the full Liouvillian retains only the $U(1)$ symmetry associated with total magnetization. This interplay selects a family of spatially uniform $U(1)$-neutral eigenoperators with exact eigenvalues $\lambda=-2q$. Highly symmetric initial states have spectral weight only on this family, so higher-order components decay rapidly and the $\lambda=-2$ mode governs the long-time dynamics, producing universal $D(t)\sim e^{-2t}$ relaxation independent of system size and interaction range. Breaking the Hamiltonian symmetry restores overlap with slow Liouvillian modes and strongly suppresses relaxation. This symmetry-filtered accessibility gives rise to a strong quantum Mpemba effect, where a state farther from the steady state relaxes faster than closer thermal states. Our results establish symmetry-filtered Liouvillian mode accessibility as a route to controlling nonequilibrium relaxation in open quantum systems.

阅读与讨论 → 访问原文 →

14.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2603.26039

Achieving double-logarithmic precision dependence in optimization-based quantum unstructured search

作者:

Zhijian Lai ↗Dong An ↗Jiang Hu ↗Zaiwen Wen ↗

arXiv:2603.26039v3 Announce Type: replace Abstract: Grover's algorithm is a fundamental quantum algorithm that achieves a quadratic speedup for unstructured search problems of size $N$. Recent studies have reformulated this task as a maximization problem on the unitary manifold and solved it via linearly convergent Riemannian gradient ascent (RGA) methods, resulting in a complexity of $O(\sqrt{N/M}\log (1/\varepsilon))$, where $M$ denotes the number of target items and $\varepsilon$ denotes the success probability error. In this work, we adopt the Riemannian modified Newton (RMN) method to solve the quantum search problem, under the assumption that the ratio $ M/N$ is known. We show that, in this setting, the Riemannian Newton direction is collinear with the Riemannian gradient in the sense that the Riemannian gradient is always an eigenvector of the corresponding Riemannian Hessian. This structure removes the overhead of Hessian inversion and allows the proposed RMN method to retain the local quadratic convergence in terms of the error $\varepsilon$. More precisely, we rigorously prove an overall complexity of $O(\sqrt{N/M}+\log\log(1/\varepsilon))$. Furthermore, our approach remains Grover-compatible, namely, it relies exclusively on the standard Grover diffusion and oracle operators to ensure algorithmic implementability, and its parameter update process can be efficiently precomputed on classical computers.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.14886

Improved Knowledge Distillation for Land-Use Image Classification

作者:

Arundhuti Sur ↗Abhiroop Chatterjee ↗Susmita Ghosh ↗Emmett Ientilucci ↗

In the present article, an improved Knowledge Distillation (KD) framework has been proposed for efficient compression of deep convolutional neural networks for land-use image classification task. Motivated by the need to achieve competitive classification accuracy while reducing computational complexity, a teacher-student learning paradigm is adopted in which a VGG16 network transfers knowledge to a lightweight MobileNetV2 model. The proposed framework integrates hard supervision from ground truth labels with a soft supervision strategy that combines Kullback-Leibler divergence and Cosine Similarity losses. Experiments conducted on three land-use datasets show that the proposed KD-based method yields improved performance, and achieves an accuracy of 99.04%, outperforming both baseline student training and single-loss distillation approaches, while retaining substantial model compression.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2606.12360

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

作者:

Leon Bergen ↗Usha Bhalla ↗Sidharth Baskaran ↗Max Loeffler ↗Raphael Sarfati ↗Dhruvil Gala ↗Ryan Panwar ↗Santiago Aranguri ↗Thomas Fel ↗Atticus Geiger ↗Matthew Kowal ↗Siddharth Boppana ↗…

arXiv:2606.12360v1 Announce Type: new Abstract: Language-model post-training is the main stage at which model behavior is shaped, yet it still largely involves optimization of scalar rewards that summarize diverse desiderata. This abstraction gives practitioners little visibility into what their data actually teaches models, allowing spurious correlations to be learned by a model and inducing undesirable behaviors such as over-stylization and sycophancy. To address this problem, we ask: can we inspect a preference dataset before optimization and decide, at the level of concepts, which behaviors a model should be allowed to learn? Motivated by this, we introduce a data-centric post-training pipeline that uses interpretability protocols to develop statistical hypotheses for the latent concepts separating preferred from dispreferred generations, making them explicit for fine-grained user feedback. Building on this view, we unify several interpretability-based training protocols as ways of shaping rewards via feature or data interventions. Empirically, we show that our pipeline diagnoses undesirable signals in existing preference data, mitigates off-target learning, and can also help amplify or shape desired properties such as safeguards and model personality. More broadly, our results suggest that interpretability can turn post-training from optimizing opaque proxy rewards into a process of auditing and sculpting the learning signal itself.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2606.12433

Marginal Alignment Does Not Guarantee Joint-Distribution Fidelity: An Official-Reference Audit of Nemotron-Personas-Korea with Cross-Locale Replication

作者:

Joonhyung Bae ↗

Synthetic persona datasets cite alignment with official demographics as a basis for trust, yet downstream users consume them as joint structures across age, sex, region, occupation, education, name, and institutional status. Marginal alignment does not imply that these joints are preserved. We propose the Independence-Assumption Footprint (IAF), an audit primitive that operates on the attribute combinations a dataset card itself documents as treated independently. For each such combination, IAF compares the synthetic joint against an external official or institutional reference, using direct joint tables where available and rule-implied checks otherwise. Applied to NVIDIA Nemotron-Personas-Korea (one million Korean synthetic personas), IAF finds that NPK aligns with KOSIS marginals while three joints fail. The major-by-occupation distribution against the KEIS graduate universe carries a large conditional mismatch. The age profile of military service is institutionally inconsistent. Female representation in male-dominated occupations is substantially over-flattened toward parity, with the strict screening verdict mapping-dependent and age-robust under direct standardisation. A transferability demonstration across six further NPK locales finds locale-dependent rather than universal diagnostics, with reference-taxonomy cardinality confounding cross-locale flag counts. For synthetic personas used as silicon samples, marginal claims must therefore be paired with disclosure-anchored joint audits before reuse. The released audit artefacts (reference manifests, occupational crosswalks, derived metrics, reproducibility scripts) instantiate this protocol on the NPK family and are released for retargeting at other synthetic persona resources.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CV) 2026-06-18 DOI: arXiv:2606.18702

UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation

作者:

Lin Zhang ↗Sicheng Mo ↗Zefan Cai ↗Jinhong Lin ↗Zihao Lin ↗Jiuxiang Gu ↗Krishna Kumar Singh ↗Yuheng Li ↗Yin Li ↗

Autoregressive video diffusion models have emerged as a promising approach for long video generation, achieving strong performance in streaming settings. However, existing methods are restricted to forward temporal generation, whereas practical video creation often requires flexible generation order, e.g., conditioning on future context to extend backward, or on both past and future context for inbetween generation. We bridge this gap by training an autoregressive model that supports generation in arbitrary temporal directions. A key technical challenge arises from the Causal 3D VAE widely used in video diffusion models, which encodes latents strictly conditioned on past context. While suited for forward generation, this causal structure causes inter-block discontinuities when generation proceeds backward. To address this, we introduce blockwise anchor latents, a set of auxiliary latents that restore the missing past context at block boundaries during backward generation. Built on this design, we propose UniTemp, a bidirectional distillation framework that trains a single autoregressive student model for any-direction video generation. At inference time, UniTemp conditions on arbitrary past and/or future frames, improving controllability for both bidirectional and inbetween generation. Experiments show that UniTemp maintains competitive performance on short and long video generation compared to forward-only methods, while enabling diverse workflows such as bidirectional video extension, inbetween generation, looping video generation, scene transition, and visual story generation. Project website: https://lzhangbj.github.io/projects/unitemp/

阅读与讨论 → 访问原文 →

19.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15782

Mitigating Visual Hallucinations in Multimodal Systems through Retrieval-Augmented Reliability-Aware Inference

作者:

Pratheswaran Hariharan ↗Haiping Xu ↗Donghui Yan ↗

Multimodal large language models (MLLMs) have demonstrated strong capabilities in vision-language understanding and natural-language response generation. However, these systems can still produce overconfident predictions and hallucination-like outputs, particularly when the visual evidence is weak, ambiguous, or semantically inconsistent. Most existing approaches focus on improving multimodal representation alignment or retrieval-augmented generation, while providing limited mechanisms to quantify instance-level prediction reliability or identify incorrect visual outputs. This work proposes a retrieval-augmented reliability-aware inference framework for trustworthy multimodal visual understanding. The proposed framework constructs an external visual evidence database using pretrained visual embeddings and nearest-neighbor retrieval over normalized feature representations. Retrieved evidence is used to estimate prediction trustworthiness through multiple reliability indicators, including similarity strength, class-support agreement, evidence margin, entropy-based uncertainty, and an aggregate reliability score. Based on these signals, a decision gate determines whether the system should accept the prediction, answer with caution, or abstain/fallback when evidence is insufficient. A multimodal response-generation layer then produces a final user-facing response conditioned on the reliability decision. Experiments on ImageNet-100 demonstrate that the proposed reliability-aware framework improves accepted prediction accuracy from 85.84\% to 88.88\% at 89.04\% coverage. The hallucination-like accepted wrong-answer rate is reduced from 14.16\% to 11.12\%. These results show that integrating retrieval evidence, reliability estimation, and selective decision gating can improve calibration and reduce overconfident visual errors without retraining large multimodal models.

阅读与讨论 → 访问原文 →

20.

bioRxiv (Bioinfo) 2026-06-10 DOI: HASH:6f1438dcdd72480c3e727ece341186ec

Pseudoperplexity Probes Memorization in Protein Language Models

作者:

Plaikner ↗Ploner ↗Sewald ↗Senoner ↗Franz ↗Brenner ↗Heinzinger ↗Rost ↗

Protein Language Models (pLMs) have significantly advanced computational biology. Yet their scale and reliance on redundant training data raise a fundamental question: do pLMs generalize the statistical grammar of proteins, or do they simply memorize their training data? To investigate this, we used pseudoperplexity as a probe for sequence-level memorization, comparing ProtT5's pseudoperplexity on a pre-training proxy dataset against a post-training holdout of genuinely novel sequences. To ensure a valid comparison, we matched the datasets by sequence length, cluster size, and taxonomic family. As a statistical baseline, we trained n-gram language models; analysis of higher-order n-gram composition and a statistically significant divergence in perplexity confirmed that the post-training sequences were genuinely novel at the local sequence level. ProtT5 showed a statistically significant difference in pseudoperplexity between seen and unseen sequences, though further analysis revealed this memorization signal to be modest. These findings suggest that ProtT5 exhibits detectable but limited memorization of its training data as measured by a pseudoperplexity-based probe.

阅读与讨论 → 访问原文 →

21.

arXiv (math.PR) 2026-06-18 DOI: arXiv:2506.08782

First to reach $n$ game

作者:

Stanislav Volkov ↗Magnus Wiktorsson ↗

arXiv:2506.08782v4 Announce Type: replace Abstract: We consider a game with two players, consisting of a number of rounds, where the first player to win $n$ rounds becomes the overall winner. Who wins each individual round is governed by a certain urn having two types of balls (type 1 and type 2). At each round, we randomly pick a ball from the urn, and its type determines which of the two players wins. We study the game under three regimes. In the first and the third regimes, a ball is taken without replacement, whilst in the second regime, it is returned to the urn with one more ball of the same colour. We study the properties of the random variables equal to the properly defined overall net profits of the players, and the results are drastically different in all three regimes.

阅读与讨论 → 访问原文 →

22.

arXiv (math.PR) 2026-06-17 DOI: arXiv:2606.17792

The Loss of Tension in an Infinite Membrane with Holes of Decaying Spatial Density

作者:

Lukas Early ↗Stanislav Volkov ↗

arXiv:2606.17792v1 Announce Type: new Abstract: What is the effect of randomly removing material from an infinite stretched membrane? Under what conditions can the membrane still sustain tension? This problem was introduced by Robert Connelly in connection with applications of rigidity theory in the natural sciences, and was later studied in M. V. Menshikov, K. A. Rybnikov, and S. E. Volkov, "The loss of tension in an infinite membrane with holes distributed according to a Poisson law" (2002); a discrete version was also considered in Robert Connelly, Konstantin Rybnikov, and Stanislav Volkov, "Percolation and the Loss of Tension in an Infinite Triangular Lattice" (2001). We study a mathematical framework based on a non-homogeneous Poisson point process whose intensity $\lambda$ tends to zero at infinity. The hole shapes are i.i.d.\ and independent of their locations. We show that if the intensity does not decay too quickly, then tension is still lost throughout the whole plane, as in the homogeneous model studied in 2002. Conversely, we give sufficient conditions under which complete loss of tension does not occur. Thus, both destruction and non-destruction regimes are possible even when the intensity tends to zero, indicating a phase transition in the model. The processes studied here are closely related to bootstrap percolation.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2601.16093

SAMTok: Representing Any Mask with Two Words

作者:

Yikang Zhou ↗Tao Zhang ↗Dengxian Gong ↗Yuanzheng Wu ↗Ye Tian ↗Haochen Wang ↗Haobo Yuan ↗Jiacong Wang ↗Lu Qi ↗Hao Fei ↗Anran Wang ↗Zhuochen Wang ↗…

Pixel-wise capabilities are essential for building interactive intelligent systems. However, pixel-wise multi-modal LLMs (MLLMs) remain difficult to scale due to complex region-level encoders, specialized segmentation decoders, and incompatible training objectives. To address these challenges, we present SAMTok, a discrete mask tokenizer that converts any region mask into two special tokens and reconstructs the mask using these tokens with high fidelity. By treating masks as new language tokens, SAMTok enables base MLLMs (such as the QwenVL series) to learn pixel-wise capabilities through standard next-token prediction and simple reinforcement learning, without architectural modifications and specialized loss design. SAMTok builds on SAM2 and is trained on 209M diverse masks using a mask encoder and residual vector quantizer to produce discrete, compact, and information-rich tokens. With 5M SAMTok-formatted mask understanding and generation data samples, QwenVL-SAMTok attains state-of-the-art or comparable results on region captioning, region VQA, grounded conversation, referring segmentation, scene graph parsing, and multi-round interactive segmentation. We further introduce a textual answer-matching reward that enables efficient reinforcement learning for mask generation, delivering substantial improvements on GRES and GCG benchmarks. Our results demonstrate a scalable and straightforward paradigm for equipping MLLMs with strong pixel-wise capabilities. Our code and models are available.

阅读与讨论 → 访问原文 →

24.

Nature (Science) 2026-06-10 DOI: HASH:f8a664017587ebba32576c5bffcc957a

Artificial intelligence shines a light on hidden global migration flows

作者:

Francisco Lara-García ↗

Training a neural network to collate data from several sources provides a high-resolution view of how people are moving around the world. Training a neural network to collate data from several sources provides a high-resolution view of how people are moving around the world.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2603.12514

CT-VDETR: Semi-supervised 3D Trauma Detection in Computed Tomography (CT) scans using Dense Vertex Relative Position Encoding

作者:

Shivam Chaudhary ↗Sheethal Bhat ↗Andreas Maier ↗

Accurate detection and localization of traumatic injuries in abdominal CT remain challenging because voxel-level annotations are limited and expensive to obtain. We present a label-efficient framework for 3D abdominal trauma detection that combines self-supervised pretraining with semi-supervised transformer-based detection. First, we use Masked Image Modeling (MIM) on 1098 CT volumes to pretrain a 3D U-Net encoder for anatomical representation learning. Next, we adapt V-DETR to dense volumetric CT through a feature adapter that converts the encoder feature grid into a compact token sequence for transformer decoding. The pretrained encoder is then integrated with V-DETR and 3D Vertex Relative Position Encoding (3D V-RPE) to improve the localization of irregularly shaped injuries. Finally, semi-supervised teacher-student consistency regularization leverages 2,000 additional unlabeled volumes during detector training. To the best of our knowledge, this is the first application of a 3D DETR-style detector to the RSNA abdominal trauma detection task. On this benchmark, the proposed method achieves 31.33% test mAP@0.50 using only 78 labeled training volumes, corresponding to a 1.53x improvement over supervised-only training. These results show that combining medical-domain pretraining with semi-supervised learning is an effective strategy for label-scarce 3D medical detection.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络