论文广场 - AcademicHub

01.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.11710

ERN-Net : Evolving Reason Node-Net for Document Binarization

作者:

Hsin-Jui Pan ↗Sheng-Wei Chan ↗Jen-Shiung Chiang ↗

This paper presents ERN-Net, an Evolving Reason Node-Net for efficient document image binarization. ERN-Net enhances degradation-sensitive regions, such as faint strokes, broken characters, and noisy backgrounds, through evolving reason nodes and multi-scale reasoning. We further compare ResNet-101, ConvNeXt-Tiny, and ConvNeXt-Base, and find that ConvNeXt-Tiny provides the best practical trade-off between accuracy and memory usage. In addition, DIBCO-based pretraining improves binarization performance without increasing model memory consumption, requiring only about 1.5 additional training hours. Experiments on DIBCO-style benchmarks show that ERN-Net is effective under low-data and low-memory settings.

阅读与讨论 → 访问原文 →

02.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2511.03204

Application and quantum properties of superpositions of oppositely squeezed states

作者:

Hiroo Azuma ↗William J. Munro ↗Kae Nemoto ↗

arXiv:2511.03204v2 Announce Type: replace Abstract: We show that superpositions of oppositely squeezed states – non-Gaussian Schr{\"{o}}dinger-cat-like states – exhibit enhanced nonclassical features and provide an entanglement advantage in the small-squeezing regime. These states possess photon-number structures distinct from conventional coherent-state cat states, and we analyze their Wigner functions and the entanglement generated when they are injected into a 50-50 beam splitter. As a practical application, we demonstrate that they enable a high-quality heralded single-photon source whose second-order intensity correlation function is smaller than that obtained from a pure two-mode squeezed vacuum state. We further propose a linear-optical heralding scheme that approximates these superpositions without requiring strong Kerr nonlinearities. Our results indicate that the superposition of oppositely squeezed states is a promising non-Gaussian resource for quantum information processing, particularly for single-photon generation.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.01338

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

作者:

Sagar Bhetwal ↗Rajan Bastakoti ↗Nirajan Acharya ↗Gaurav Kumar Gupta ↗Ambika Baniya Bhandari ↗

Biopharmaceutical manufacturing organizations operate under regulatory frameworks such as FDA guidance, EU Good Manufacturing Practice (GMP), and the EU AI Act, which can restrict the use of cloud-based artificial intelligence systems. Locally deployed large language models (LLMs) offer a privacy-preserving alternative, but their suitability for pharmaceutical manufacturing tasks remains underexplored. This study evaluates four open-source LLMs (Qwen 2.5 Coder 7B, Llama 3.1 8B, Mistral 7B, and Meditron 7B) deployed locally via Ollama for natural-language-to-SQL generation over a pharmaceutical manufacturing database. A FastAPI-based evaluation platform, PharmaBatchDB AI, was developed using a synthetic Microsoft SQL Server database containing approximately 63,000 records across Batch, Manufacturing Execution System (MES), and Clean-In-Place (CIP) modules. Models were benchmarked on 60 domain-specific natural-language questions using metrics including SQL extraction rate, SQL compliance, factual consistency, ROUGE-L, hallucination rate, throughput, and latency. Qwen 2.5 Coder 7B, Llama 3.1 8B, and Mistral 7B generated SQL for all evaluation tasks, while Meditron 7B failed on nearly all tasks due to context-window limitations and poor SQL generation capability. Llama 3.1 8B achieved the highest SQL compliance, whereas Qwen 2.5 Coder 7B achieved the strongest overall text similarity and factual consistency. Performance differences between the two leading models were not statistically significant. The results show that code-tuned general-purpose LLMs outperform a domain-specific biomedical model on structured query generation for pharmaceutical manufacturing data. Although fully local, GxP-aligned NLQ systems are feasible on consumer hardware, current performance levels still require human oversight and downstream validation for regulated use.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15167

Variational Network with Wavelet-based UNET in Accelerated MRI Reconstruction from Under Sampled K-space Data

作者:

Yasir Arafat Prodhan ↗Shaikh Anowarul Fattah ↗

Fully sampled MRI requires dense k-space acquisition, leading to long scan times, reduced clinical throughput, and increased sensitivity to patient motion. Accelerated MRI addresses this by acquiring undersampled k-space data and reconstructing the missing information computationally. However, reconstruction from undersampled measurements is highly ill-posed and can introduce aliasing artifacts, noise amplification, and loss of anatomical detail. Although conventional parallel imaging and compressed sensing methods mitigate these issues, and deep learning methods have further improved reconstruction quality, preserving high-frequency structures under aggressive undersampling remains challenging. In this work, we propose a Variational Network with a Wavelet-based U-Net (W-UNet) for accelerated MRI reconstruction. The framework combines physics-guided iterative reconstruction with learnable multi-scale frequency representations. Standard pooling operations are replaced with Discrete Wavelet Transform and Inverse Wavelet Transform modules, enabling lossless downsampling while preserving low-frequency structure and high-frequency edge details. Integrated into the refinement and sensitivity map estimation stages, the proposed design improves artifact suppression, feature preservation, and reconstruction fidelity in both single-coil and multi-coil settings. Experiments on fastMRI knee and M4Raw brain datasets show state-of-the-art performance. Ablation studies further confirm the effectiveness of wavelet-based feature decomposition for accelerated MRI reconstruction.

阅读与讨论 → 访问原文 →

05.

arXiv (quant-ph) 2026-06-15 DOI: arXiv:2606.13975

Implementation of two-qubit Rydberg operations on neutral Rb-87 atoms in systems with different intermediate states

作者:

I. V. Iukhnovets ↗O. V. Bychkova ↗I. B. Bobrov ↗A. P. Gordeev ↗M. Y. Goloshchapov ↗G. I. Struchalin ↗S. S. Straupe ↗

arXiv:2606.13975v1 Announce Type: new Abstract: This work presents an experimental setup for implementing two-qubit operations on neutral atoms ($^{87}$Rb) with the possibility of using two different Rydberg excitation schemes. One of them uses 5P$_{1/2}$ as the intermediate level and applies the second-stage beam locally to the addressed atoms. The second scheme uses the 6P$_{3/2}$ level; in this scheme, the particles to be entangled are moved to a separate zone through which both Rydberg beams pass. The advantages and limitations of both schemes are analyzed. Based on numerical modeling performed with a Julia package developed by the authors, it is demonstrated that the spatial configuration has a greater effect on quantum-operation fidelity than the choice of intermediate level. An experimental implementation of the scheme using the 6P$_{3/2}$ level is demonstrated, making it possible to achieve a two-qubit operation fidelity of 94%.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2509.09631

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Discrete Flow Matching

作者:

Ngoc-Son Nguyen ↗Thanh V. T. Tran ↗Hieu-Nghia Huynh-Nguyen ↗Truong-Son Hy ↗Van Nguyen ↗

Zero-shot text-to-speech (TTS) has made significant progress in replicating unseen voices, yet balancing generation quality and inference efficiency remains challenging. Autoregressive models suffer from high latency, while diffusion-based approaches are constrained by training-time configurations. Moreover, most flow-based methods operate in continuous space, which introduces optimization challenges because continuous token spaces are inherently more complex than discrete ones. To address these limitations, we propose DiFlow-TTS, a novel zero-shot TTS framework based on discrete flow matching. The model consists of a deterministic Phoneme-Content Mapper for linguistic modeling and a Factorized Discrete Flow Denoiser that simultaneously generates prosody and acoustic token streams. Experimental results demonstrate the effectiveness of our approach across multiple evaluation metrics.

阅读与讨论 → 访问原文 →

07.

bioRxiv (Bioinfo) 2026-06-12 DOI: HASH:b906a927053888a037ad818250e13dd0

PeptiDIA: A Machine Learning Framework for Enhanced Peptide Identification in Fast-Gradient Data-Independent Acquisition Proteomics

作者:

Ortona ↗Leclercq ↗Roux-Dalvai ↗Routy ↗Bonnet ↗Droit ↗

Data-independent acquisition (DIA) mass spectrometry has become increasingly prevalent in proteomics as advances in instrumentation, chromatography, and computational analysis have enabled robust proteome identification across complex biological samples. However, analytical depth achieved with fast chromatographic gradients remains lower than that obtained using long-gradients, reflecting a throughput-depth trade-off. Here, we present PeptiDIA, a machine learning framework that enhances peptide identification in fast-gradient DIA data by leveraging paired fast and long-gradient acquisitions from identical samples. PeptiDIA processes DIA-NN outputs generated at relaxed false discovery rate thresholds to obtain expanded candidate peptide pools and trains gradient-boosted decision tree models using long-gradient identifications as reference labels. The model integrates DIA-NN features with engineered peptide descriptors and applies isotonic regression to calibrate probabilities, enabling controlled peptide recovery relative to the long-gradient reference. Applied to human and murine datasets spanning six tissues acquired on an Orbitrap Exploris 480, PeptiDIA increased peptide identifications by 25-34% at 1% target reference-discordance rate (RDR) and increased the number of protein groups containing at least one rescued peptide by 15-17%. Overall, PeptiDIA improves the identification depth of fast-gradient DIA-NN workflows without altering acquisition strategies. The framework is available as a web application and command-line tool at https://github.com/Jordano700/PeptiDIA.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17312

Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty

作者:

Baishali Chaudhury ↗Mengdie Flora Wang ↗Hyunji Hayley Park ↗Rahul Ghosh ↗Sungmin Hong ↗Jae Oh Woo ↗

arXiv:2606.17312v1 Announce Type: new Abstract: Large language models can arrive at the same answer through reasoning paths that are unstable, contradictory, or difficult to rank consistently – a failure mode especially prevalent in multi-step deductive reasoning. Existing methods assess reliability primarily through output dispersion – measuring how much sampled answers differ – but this discards a complementary signal: whether the model can consistently rank competing reasoning candidates. We propose structural uncertainty, a consistency-aware framework derived from the stability of self-preference-induced rankings over sampled reasoning solutions. Given a query, we generate multiple candidate solutions and ask the model to judge pairwise preferences among its own outputs. We aggregate self-preferences into ranking distributions via Bradley-Terry modeling with PageRank, and decompose the signal into two entropy-based components: across-trial ranking instability and within-trial candidate ambiguity. Across five LLMs and eight benchmarks, structural signals provide information complementary to answer dispersion: on logical and mathematical reasoning tasks, the combination improves identification of unreliable instances, while on factual retrieval the structural signal collapses toward uniformity, diagnosing a regime boundary where reasoning-level consistency evaluation is uninformative. The two components relate differently to accuracy: within-trial ambiguity correlates positively with correctness – consistent with settings where multiple plausible solution paths remain competitive – while across-trial instability correlates negatively, signaling unreliable reasoning. Structural uncertainty is best understood not as a universal confidence estimator, but as a regime-sensitive evaluator of logical reasoning consistency.

阅读与讨论 → 访问原文 →

09.

Nature (Science) 2026-06-17 DOI: HASH:eb1e77b17bb154e22eedac2fc5f0537e

Spatial distribution of the proteome in the human body and in cancers

作者:

Liang Yue ↗

A detailed, spatially resolved quantitative map of the human proteome is essential for a deeper understanding of human biology and disease1–4. Here we present a comprehensive human proteomic landscape, generated by profiling more than 13,000 proteins across 2,856 samples using data-independent acquisition mass spectrometry. The dataset spans 58 major tissue types, 251 specific tissue subtypes and 25 distinct carcinomas. This resource enables the depiction of spatially resolved proteome trajectories across tissue types and physiological states, including fetal, tumour, adjacent non-tumour and healthy adult tissue, thereby providing insight into both developmental processes and oncogenic progression. Furthermore, quantitative proteomics comparisons across diverse tissue types and states facilitate the indication of organ-specific toxicity, the identification of repurposable anticancer drug candidates and the prioritization of therapeutic targets for cancers. This study establishes a quantitative resource for navigating the proteome in the human body and in common cancers. A spatially resolved map of the human proteome across a variety of healthy tissues and cancers provides wide-ranging insights in developmental biology and oncology, and could aid the identification of therapeutic targets and development of treatments for cancer.

阅读与讨论 → 访问原文 →

10.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2606.17622

Quantum mechanics in configuration space in context

作者:

Arwa Bukhari ↗Margherita Moro ↗Max Davies ↗Alastair Wilson ↗Almut Beige ↗

arXiv:2606.17622v1 Announce Type: new Abstract: To enhance the way in which wave-particle duality is implemented in the modelling of quantum mechanical systems, Bukhari et al. [New J. Phys. 27, 084501 (2025)] recently introduced an alternative approach to quantum mechanics, namely quantum mechanics in configuration space. This formalism is based on a physically motivated quantisation of Newtonian mechanics and promotes the classical position-velocity states (x,v) to pairwise distinguishable quantum states. The resulting |x,v> states form the basis of the Hilbert space of individual quantum mechanical particles and evolve along classical trajectories. In this paper, we consider the modelling of a mechanical particle in free space and put quantum mechanics in configuration space into context. It is shown that this formalism increases the continuity between quantum and classical mechanics by avoiding a conceptual inconsistency associated with the definition of momentum in canonical quantisation. In addition, we emphasise that standard quantum mechanics and quantum mechanics in configuration space are based on two distinct formulations of classical mechanics.

阅读与讨论 → 访问原文 →

11.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2606.17548

Kinematic properties of the Pauli equation

作者:

E. E. Perepelkin ↗B. I. Sadovnikov ↗N. G. Inozemtseva ↗V. A. Svetovidov ↗

arXiv:2606.17548v1 Announce Type: new Abstract: Based on the Wigner-Vlasov formalism, this paper investigates the kinematic properties of the Pauli equation. It is shown that the probability current associated with the Pauli equation can be represented as a superposition of two currents with certain expansion coefficients. Each of these currents corresponds to a particular component of the spinor. The expansion coefficients effectively serve as weighting functions that determine the probability contribution of the corresponding spinor component. Therefore, each spin projection corresponds to its own probability flux. A new system of the Hamilton-Jacobi equations and also a system of motion equations in electromagnetic fields are obtained, taking into account the interaction between the spin and the magnetic field. To illustrate how these equations can be applied we have investigated the quantum system kinematics in detail using an exact solution of the Pauli equation in the presence of a uniform magnetic field and an asymmetric quadratic potential.

阅读与讨论 → 访问原文 →

12.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2508.09134

Instrument-based quantum resources: quantification, hierarchies and towards constructing resource theories

作者:

Jatin Ghai ↗Arindam Mitra ↗

arXiv:2508.09134v3 Announce Type: replace Abstract: Quantum resources are certain features of the quantum world that provide advantages in certain information-theoretic, thermodynamic, or other useful operational tasks that are outside the realm of what classical theories can achieve. Quantum resource theories provide us with an elegant framework for studying these resources quantitatively and rigorously. While numerous state-based quantum resource theories have already been investigated, and to some extent, measurement-based resource theories have also been explored, instrument-based resource theories remain largely unexplored, with only a few notable exceptions. As quantum instruments are devices that provide both the classical outcomes of induced measurements and the post-measurement quantum states, they are quite important, especially for scenarios where multiple parties sequentially act on a quantum system. In this work, we study several instrument-based resource theories, namely (1) the resource theory of information preservability, (2) the resource theory of (strong) entanglement preservability, (3) the resource theory of (strong) incompatibility preservability, (4) the resource theory of traditional incompatibility, and (5) the resource theory of parallel incompatibility. Furthermore, we outline the hierarchies of these instrument-based resources and provide measures to quantify them. We then also established a relationship between our resource measure and the advantage in an information-theoretic task. In short, we provide a detailed framework for a wide variety of instrument-based quantum resource theories.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.CL) 2026-06-15 DOI: arXiv:2606.14127

CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search

作者:

Yilin Wen ↗Rong Yang ↗Xiaojia Chang ↗Hong Sun ↗Gefu Tang ↗Chunhui Liu ↗Jeffrey Chen ↗Zeyu Ma ↗Lisong Qiu ↗Xiaochuan Fan ↗Congjia Yu ↗Quan Zhou ↗…

LLM-based query rewriters in production face a tension: the training reward must reflect how the rewrite is consumed by the production ranker, yet the training procedure must be cheap enough to support continuous redeployment as data drifts. We present CoRe (Context Relevance), such a system, redeployed weekly for over five months in a major short-video search engine. Our reward uses the deployed multimodal relevance model as its source and a multiplicative ratio form mirroring the production fusion algebra, closing the simulation-production gap that offline reward proxies leave open. A semi-online Mixed Preference Optimization loop makes this reward affordable at multi-million-instance weekly scale: a DPO-style pairwise objective restricts the gradient pass to a small top-k/bottom-k subset of sampled trajectories, and a phase structure reduces trainer/inference-server parameter syncs from per-step to per-phase. An automated promotion gate over reward-like and stability metrics detected and recovered from a real reward-hacking incident in production. Rewriter output is consumed as parallel relevance signals at recall, rawrank, and finerank without displacing the original signals, bounding rewriter-failure blast radius. Online A/B from two sequential production launches, first deploying the rewriter at finerank, then extending consumption to recall and rawrank, delivers statistically significant reductions in change-query rate on rewrite-impacted queries, with all headline relevance and engagement metrics moving in the expected direction.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2606.13704

Position: AI Must Become Planet-Centered, Not Just Human-Centered

作者:

Maria Perez-Ortiz ↗

arXiv:2606.13704v1 Announce Type: cross Abstract: This position paper argues that contemporary AI paradigms are insufficient for supporting complex global goals and introduces Planet-Centered AI (PCAI) as a design philosophy and research agenda that reorients AI toward planetary-scale socio-ecological systems and their long-term trajectories. A planet-centered approach is grounded in systems thinking, treating Earth as an interconnected whole of which humans are part. We diagnose recurring limitations across AI frameworks, many of which remain human-centered, and show why these become especially consequential under current planetary conditions characterized by systemic risk, non-stationarity, and deep uncertainty. We then articulate how PCAI reshapes the AI lifecycle, from problem formulation and model design to evaluation and deployment, by emphasizing alignment with global agendas, developing system-aware AI foundations, trajectory-oriented evaluation, and monitorability. Finally, we advance a falsifiable claim: AI systems optimized without explicit consideration of systemic consequences are more likely to exacerbate systemic instability than to mitigate it.

阅读与讨论 → 访问原文 →

15.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2601.19093

Residual-Squeezing Mechanism of Mismatch in Inverse-Squeezing Kennedy Receivers

作者:

Enhao Bai ↗Fengkai Sun ↗Tianyi Wu ↗Yang Ran ↗Zichao Zhou ↗Huankai Zhang ↗Jian Peng ↗Chen Dong ↗Laiyuan Tong ↗Zhenrong Zhang ↗Yaping Li ↗

arXiv:2601.19093v4 Announce Type: replace Abstract: The discrimination of quantum states is fundamental to quantum information processing. Inverse-squeezing Kennedy (IS-Kennedy) receivers can outperform the coherent-state BPSK Helstrom benchmark at the same energy by converting transmitter-side squeezing into an effective coherent-state separation gain, without violating the Helstrom bound for the squeezed-state alphabet. This work investigates how squeezing mismatch degrades this mechanism. We show that imperfect inverse squeezing transforms the ideally nulled output into a residually squeezed state, thereby altering the photon-number statistics before detection. This residual-squeezing picture reveals a strong physical asymmetry between squeezing-magnitude and squeezing-phase mismatches. Magnitude mismatch produces an energy-independent error floor in the high-signal-energy regime, whereas phase mismatch generates a residual squeezing term that grows with signal energy. In the small-residual-squeezing regime, this leads to a polynomial growth of the leading error contribution and a rapid collapse of the SQL advantage. We also identify a parity-step effect in photon-number-resolving detection: because the nulled residual squeezed vacuum contains only even photon numbers, increasing detector resolution improves the high-energy robustness only when the effective saturation threshold crosses the next even photon number. These results identify phase locking as the dominant bottleneck for IS-Kennedy-type non-Gaussian receivers under unitary squeezing mismatch and provide design guidelines for robust squeezed-state quantum receivers.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2605.10574

LLM Jaggedness Unlocks Scientific Creativity

作者:

Shray Mathur ↗J. Anibal Boscoboinik ↗Esther H. R. Tsai ↗Kevin G. Yager ↗

arXiv:2605.10574v3 Announce Type: replace Abstract: As artificial intelligence advances, models are not improving uniformly. Instead, progress unfolds in a jagged fashion, with capabilities growing unevenly across tasks, domains, and model scales. In this work, we examine this dynamic jaggedness through the lens of scientific idea generation. We introduce SciAidanBench, a benchmark of open-ended scientific questions designed to measure the scientific creativity of large language models (LLMs). Given a scientific question, models are asked to generate as many unique and coherent ideas as possible, with the total number of valid responses serving as a proxy for creative potential. Evaluating 19 base models across 8 providers (30 total variants including reasoning versions), we find that jaggedness manifests both across models and within models. First, in a cross-task comparison between general and scientific creativity, improvements in general creativity do not translate uniformly to scientific creativity, revealing divergent capability profiles across models. Second, at the prompt level, stronger models do not improve uniformly; instead, they exhibit high variability, with bursts of creativity on some questions and limited performance on others. Third, at the domain level, individual models display uneven strengths across scientific subfields, reflecting fragmented internal capability profiles. Finally, we show that this jaggedness can be harnessed. We explore mechanisms of inference-time compute, knowledge pooling, and brainstorming to combine models effectively and construct meta-model ensembles that outperform any single model. Our results position jaggedness not as a limitation, but as a resource, a structural feature of AI progress that, when understood and leveraged, can amplify LLM-driven scientific creativity.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.19549

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

作者:

Lin Tang ↗Wei Zhang ↗Jing Li ↗Hongyu Chen ↗Ming Zhao ↗Yuxuan Wang ↗

arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utility after merging, and show that it can be forecast from signals measured in the first few percent of training – chiefly how the low-rank updates and their gradients align across tasks and how much they disturb shared representations. We package these signals into MergeProbe, a lightweight predictor that estimates pairwise and set-level retention and turns the estimate into a concrete decision: merge directly, reweight, prune, or route. On MERGE-PEFT, a five-domain benchmark spanning math, code, science, instruction following, and safety, MergeProbe attains the best average and worst-case retention among strong interference-aware merge baselines while adding far less deployment overhead than full task routing. This turns LoRA merging from a post-hoc engineering step into an anticipatory measurement problem.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2606.14504

Scratched Lenses, Shifted Depth: Passive Camera-Side Optical Attacks

作者:

Qinlin He ↗Zeming Zhuang ↗Yongji Wu ↗Lan Zhang ↗Xiaoyong ↗Yuan ↗

Physical adversarial attacks on vision systems are typically studied through scene manipulation, such as adversarial patches or projections, where the adversary controls what the camera observes. Camera-side attacks using stickers or auxiliary optics have also been explored, but they treat attacks as image-space perturbations from designed patterns. This misses how physical imperfections interact with scene-dependent lighting and optics. We identify a threat: passive lens-side damage that is persistent yet trigger-conditioned, producing optical artifacts that bias geometric inference under particular visual conditions. We instantiate this threat through Scratch-induced Lens Adversarial Streak Hijacking SLASH, a physical-world attack caused by small scratches on a camera lens or protective cover. Scratches interact with bright light sources and specular reflections to create structured streak artifacts that distort depth cues. Since the perturbation is fixed in the optical path but triggered by the scene, it is both persistent and selective. We formulate the attack in optical space, model the scratch pattern as a trigger-conditioned optical channel, and optimize one fixed configuration across diverse viewing conditions. We evaluate SLASH on monocular depth estimation and monocular 3D object detection in digital and real-world settings. Under the fixed-scratch constraint, directional depth shifts reach up to 32% relative error for monocular depth estimation, with consistent effects on monocular 3D object detection. Physical experiments confirm transfer to real camera recordings, inducing depth shifts above the model's natural prediction baseline. These findings reveal an attack surface where benign-looking hardware imperfections act as latent, scene-triggered adversarial mechanisms, challenging assumptions about physical robustness and motivating defenses for secure vision systems.

阅读与讨论 → 访问原文 →

19.

arXiv (math.PR) 2026-06-18 DOI: arXiv:2606.08304

Functions of Bounded Variation and Point Processes

作者:

J. Antezana ↗M. Levi ↗J. Marzo ↗J. Ortega-Cerd\`a ↗

arXiv:2606.08304v2 Announce Type: replace-cross Abstract: We investigate the relationship between the analytical properties of functions of bounded variation and the statistical behavior of hyperuniform point processes. We establish several characterization formulas for the jump part of the gradient of a bounded variation function, extending and unifying previous results by Beretti–Gennaioli and Dávila. In particular, we provide new expressions for the $L^2$-jump of the gradient using both difference quotients and Fourier transform methods. Furthermore, we connect these analytic structures to the theory of hyperuniform point processes. By analyzing the variance of linear statistics associated with bounded variation functions, we provide asymptotic estimates that depend on the specific classification of the hyperuniformity of the point process. The results show how the regularity and jump discontinuities of a function dictate the growth rate of fluctuations in point processes. Finally, we introduce an averaged quadratic BMO-type oscillation functional over translated and rotated cube partitions, similar to the one recently studied by Ambrosio et al., and prove, using results from point process, that it converges to an explicit dimensional constant times the $L^2-$jump, giving in particular a further new characterization of the perimeter of a set.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.15436

Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models

作者:

Mayur Sanap ↗Prasanna Desikan ↗Edgar Lobaton ↗

arXiv:2606.15436v1 Announce Type: cross Abstract: Respiratory acoustic foundation models (FMs) excel at cough classification, yet their ability to predict continuous health quantities from cough audio remains largely unexplored, despite the clinical value of passive age, BMI, and disease probability estimation in settings where physical measurements are unavailable. We introduce the multi-model, multi-target cough regression benchmark evaluating five FMs (OPERA-CT, OPERA-CE, OPERA-GT, HeAR, M2D+Resp) across six targets on three datasets under subject-disjoint protocols, comparing linear, MLP-small, and full MLP regression heads. MLP-small beats the mean-predictor baseline on all tasks and linear probing in 23 of 30 model x task cases, with full MLP overfitting on small clinical data but recovering on larger sets, revealing a dataset size x head-capacity trade-off. HeAR leads within-dataset age regression on Coswara (9.12 yr MAE); its CIDRZ result is excluded from headline claims owing to possible HeAR-CIDRZ pretraining overlap. OPERA-GT is favored over OPERA-CT on age in all three datasets, with the CIDRZ margin within seed variance, extending a generative-pretraining advantage from breath to cough. HeAR and M2D+Resp reach near-full performance at N = 50 samples while OPERA models require N = 400. Cross-dataset transfer is strongly asymmetric as large diverse data generalises to small clinical populations (CoughVID to CIDRZ: -0.17 yr) but not vice versa (CIDRZ to Coswara: +2.43 yr, +26.6%).

阅读与讨论 → 访问原文 →

21.

bioRxiv (Bioinfo) 2026-06-11 DOI: HASH:5cc1e1ff3e50e8374c8a5094f1c1b7ef

DModE: An end-to-end framework for Differential Modification and Expression Analysis of Nanopore direct RNA sequencing data

作者:

Miedema ↗Pastore ↗Drescher ↗Haehnel ↗Wierczeiko ↗Alagna ↗Lehmann ↗Helm ↗Butto ↗Gerber ↗

Summary: Nanopore direct RNA sequencing (DRS) enables simultaneous quantification of transcript abundance and RNA modifications from native RNA molecules, providing a unique opportunity to study transcriptional and epitranscriptomic regulation within a single experiment. However, comprehensive analysis of DRS data remains challenging, as existing workflows typically focus on individual processing steps and often require manual integration of multiple software packages for expression analysis, modification detection, statistical testing, and visualization. Furthermore, integrated differential expression and differential RNA modification analysis at both gene and isoform resolution remains poorly supported by current workflows. Here, we present DModE (Differential Modification and Expression Analysis), an end-to-end framework for integrated analysis of Nanopore DRS data. DModE combines an Epi2ME-compatible Nextflow preprocessing workflow with a dedicated Python package for downstream statistical analysis, visualization, and reporting. The framework supports differential gene and isoform expression analysis, differential RNA modification analysis at genome and transcript level, metagene profiling, exploratory epitranscriptomic analyses, and integrated assessment of relationships between expression and modification dynamics. Results are automatically summarized in interactive HTML reports, facilitating reproducible and accessible data interpretation. By integrating transcriptomic and epitranscriptomic analyses within a single framework, DModE substantially simplifies comprehensive DRS data analysis and lowers the barrier for studying RNA modification biology using Nanopore sequencing.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.08011

Rewrite to Translate, Translate to Reward: Reinforcement Learning for Source Rewriting in Machine Translation

作者:

Boxuan Lyu ↗Haiyue Song ↗Zhi Qu ↗Hidetaka Kamigaito ↗Kotaro Funakoshi ↗Manabu Okumura ↗

Rewriting source text with large language models (LLMs) before translation has been shown to improve machine translation (MT) quality. However, we find that prompt-based rewriting can degrade translation quality rather than improve it, particularly when smaller LLMs, such as 4B-parameter models, are used. We argue that this limitation stems from the difficulty of controlling rewriting behavior through natural-language prompts alone: a rewrite is useful only if it improves downstream translation, yet existing prompt-based methods do not explicitly optimize for this signal. To address this issue, we propose RLSR (Reinforcement Learning for Source Rewriting), a reinforcement learning framework that trains the rewriting model with a reward based on the downstream translation-quality improvement produced by each rewrite. Experiments across six MT systems and 16 language pairs show that our 4B RLSR-trained rewriting models significantly outperform both the no-rewriting baseline and prompt-based rewriting baselines at the same model scale, while remaining competitive with baselines that use a 235B LLM.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19627

VCG: A Multimodal Retrieval Framework for E-Commerce Video Feeds under Extreme Cold-Start Conditions

作者:

arXiv:2606.19627v1 Announce Type: cross Abstract: The digital commerce landscape is shifting from static, search-driven catalogs to dynamic, immersive video feeds. This transition introduces an ``extreme cold-start'' problem: unlike traditional items, new short-form videos lack the dense interaction history required for collaborative filtering. Furthermore, immersive feeds introduce strong position and duration biases that distort standard engagement signals. In this paper, we demonstrate the Video Candidate Generation (VCG) system, a scalable multimodal retrieval engine designed to solve these challenges in a large-scale e-commerce environment. By leveraging a domain-adapted vision-language model (based on CLIP), we map users and videos into a shared semantic space, enabling zero-shot retrieval based on visual content rather than behavioral history. We detail the system's architecture and present a rigorous evaluation comparing generative (LLM) vs. discriminative (CLIP) embeddings. Our results show that while generative models excel at attribute prediction, they suffer from embedding space collapse in retrieval tasks. Online A/B testing demonstrates that VCG effectively mitigates engagement biases, yielding a 50\% uplift in deep video completion. To showcase the system's capabilities, we present an interactive demonstration featuring three bi-directional retrieval scenarios: Product-to-Video, Video-to-Product, and Zero-Shot Semantic Search.

阅读与讨论 → 访问原文 →

24.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2601.03885

Efficient upsampling for tensor-network and quantum-state encoded functions

作者:

Siddhartha E. Guzman ↗Egor Tiunov ↗Leandro Aolita ↗

arXiv:2601.03885v2 Announce Type: cross Abstract: Both tensor trains (TTs) and quantum states provide compressed representations of grid-structured data with potentially exponential compression power. We present a unified framework for upsampling data encoded in vector amplitudes, with efficient realizations in both classical TT and quantum settings. Starting from an $n$-core TT or an $n$-qubit state on a coarse grid with $2^n$ points, the construction produces an $(n+m)$-core TT or $(n+m)$-qubit state on a finer grid with $2^{n+m}$ points. In the TT setting, it supports interpolation, quasi-interpolation, augmentation, and synthesis through efficient low-rank contractions, with the added $m$ cores retaining constant rank. For function-value encodings, the resulting interpolation satisfies an $\ell^2$-error bound independent of the number of added grid points, achieves exponential compression at fixed accuracy, and has a logarithmic complexity in the number of grid points. In the quantum setting, the refined state is prepared by a $\mathrm{poly}(n,m)$-size circuit using $\log(p+1)$ ancillas, where $p$ controls the smoothness of the quasi-interpolant; the corresponding error scales quadratically with the initial grid spacing. We validate our framework for tensor networks in one-, two-, and three-dimensional examples, including functions, derivatives, airfoil masks, and synthetic random fields such as three-dimensional turbulence. In particular, fractal fields can be generated directly in TT format with logarithmic memory and runtime. These results open a practical route to multiscale solvers, generative models, and geometry-aware algorithms on tensor-network and quantum platforms, with potential applications in scientific simulation, imaging, and real-time graphics.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19597

PrefSQA: Pairwise Preference Prediction for Speech Quality Assessment and the Critical Role of High Quality Datasets

作者:

Junyi Fan ↗Donald S. Williamson ↗

arXiv:2606.19597v1 Announce Type: cross Abstract: Mean opinion scores (MOS) are widely used for speech quality assessment, yet scalar labels are sensitive to rater variability and listening test differences. This introduces labeling noise, which limits the reliability of MOS prediction. Preference prediction reduces this variability as listeners compare signals directly, producing cleaner labels. We study MOS-free preference prediction and propose PrefSQA, which incorporates uncertainty-aware logits, an impairment attention head, and a module based on non-matching-reference comparisons. We use and refine five datasets, including MOS-derived and low-noise simulated sets with matching and non-matching content, experiment with human preference sets, and test on unseen data. Experiments show small improvements on MOS-derived data, while other sets reveal clear improvement over the baselines, highlighting the value of high-quality preference data and demonstrating the effectiveness of the proposed method.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络