Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-17

Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond

Large language models (LLMs) are widely used to fulfill users' information needs; users ask LLMs about the weather, pose educational questions, and consult them for legal assistance. One particularly understudied area is digital security and privacy (S&P), where users may seek LLMs' help on how to secure their online accounts or protect their computers from cyber attacks. To the best of our knowledge, no prior study has collected or analyzed the S&P questions users ask LLMs; prior research on LLM response quality relied on expert-authored S&P misconceptions or FAQs rather than user queries. Drawing from WildChat, a dataset of 3.2M user-LLM conversations collected in the wild, our study identifies 14,727 S&P prompts and categorizes them into nine categories covering a wide range of S&P topics. From the S&P prompts, we sampled 450 and performed a thematic analysis to characterize the S&P questions users ask LLMs. Separate from the thematic analysis, we curated 270 advice-seeking S&P prompts, where users ask for recommendations, guidance, or specific S&P information. We measured LLM response quality and consistency when posing the prompt to LLMs 10 times. We found that commercial LLMs outperform open-weight models (GPT 5.5 provided "good enough" responses on 98% of prompts; Llama 4 on 47%). However, among prompts that received high-quality responses on average, commercial models sometimes produce contradictory responses across runs, risking confusing or misleading users.

03.
arXiv (CS.CL) 2026-06-12

An End-to-End Hybrid Framework for Rumour Detection in Low-Resources Algerian Dialect

The rapid growth of social media has intensified the spread of rumours. This issue is more challenging in the Algerian context due to the informal and code-switched nature of dialectal content, the scarcity of annotated resources, and the limited effectiveness of standard Arabic NLP tools on dialect text. This paper presents an end-to-end rumour detection hybrid framework for Algerian dialect social media content. We build a domain-specific annotated dataset by combining real social media posts, synthetic data, and the FASSILA corpus, with automatic labeling based on a similarity-based annotation process. A transliteration pipeline is also introduced to generate parallel datasets in Arabic script and Arabizi. We evaluate multiple approaches, including classical machine learning, deep learning, transformers, and hybrid models. Experimental results show that a hybrid approach combining transformer embeddings with a classical classifier achieves the best performance, reaching an F1-score of 0.84. We also find that domain-specific pre-training is more important than model size, with social media-trained models outperforming larger models trained on formal Arabic corpora. These results demonstrate the feasibility of rumour detection in low-resource Algerian dialect settings.

04.
arXiv (CS.CV) 2026-06-16

RefGC-SR$^2$: Reference-guided Generated Content Super-Resolution and Refinement

Reference-guided generation (e.g., object compositing, customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centric high-resolution reference image (HRRI) provided by users is downsampled to a fixed low-resolution (LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated content super-resolution-refinement (RefGC-SR$^2$), where the original HRRI is reused at the post-processing stage to recover lost details, refine generative artifacts, and upscale the output simultaneously. We construct the first real-world triplet data generation pipeline for this RefGC-SR$^2$ task, training a diptych-conditioned generator to synthesize paired low-quality anchors that public pretrained models cannot provide. We further present a frequency-aware diffusion transformer model for RefGC-SR$^2$ that selectively injects fine details from the HRRI while removing generative artifacts. Extensive experiments demonstrate that our RefGC-SR$^2$ model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.

05.
arXiv (CS.CL) 2026-06-12

MDForge: Agentic Molecular Dynamics Pipeline Design under Sparse Simulator Feedback

Molecular dynamics (MD) is the canonical in-silico method for atomistic molecular science, simulating molecular behavior from first-principle physics. Designing an MD pipeline for a new system requires substantial expert knowledge: running it on even one molecule is expensive, ruling out trial-and-error. We automate this expert pipeline-design process with an LLM agent. Unlike existing MD agents that orchestrate a predefined tool set, we treat pipeline design as open-ended code generation in which the agent's behavior is reshaped online by verbal reward. Specifically, we build MDForge, an LLM agent whose in-context update rule densifies the sparse reward via a multi-agent debate among physics experts. On three SAMPL host-guest binding free-energy benchmarks, MDForge automatically designs MD pipelines competitive with human experts. Deployed on a library of unseen candidate guests, its CB[7] pipeline discovers a novel binder that wet-lab competition NMR confirms is a high-affinity, picomolar CB[7] binder. Our data and code are available at https://github.com/Zehong-Wang/MDForge.

06.
arXiv (CS.LG) 2026-06-15

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

arXiv:2606.14149v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in healthcare settings, yet their tendency to hallucinate poses risks when clinical decisions are involved. This study examine whether LLMs recommend recently banned or withdrawn pharmaceuticals when answering clinical questions and tests an agent-based method for reducing such errors. We developed a five-agent "Trust but Verify" system using a single LLM backbone. To measure regulatory knowledge obsolescence, we created an adversarial dataset of 103 clinical MCQs where historically correct answers now refer to banned substances. This scale ensures statistical significance across various therapeutic classes. We evaluated three open-access model families (GPT-OSS, Llama-3, Falcon-3) under vanilla and agentic conditions. Performance was measured via pointwise score, label accuracy, Hallucination Error Rate (HER), and Component Fidelity (CF) score. We also observed clinical safety regression in proprietary models. In default configurations, all models showed high hallucination rates, consistently selecting banned drugs that matched training data patterns. Our proposed agentic architecture reduced HER by approximately 53% across models. Pointwise scores shifted from -0.25 (unsafe recommendation) toward 0.0 (appropriate refusal). The safety audit intercepted dangerous outputs even when models' parametric knowledge favored the banned substance. The proposed multi-agent framework offers a model-agnostic method for enforcing regulatory compliance that prioritizes patient safety over fluent text generation. Our work demonstrates a practical approach for deploying autonomous AI systems in safety-critical healthcare settings. It shows how real-time regulatory data can be integrated into LLM pipelines to support clinical decision-making.

07.
arXiv (CS.CL) 2026-06-12

SENTINEL: Failure-Driven Reinforcement Learning for Training Tool-Using Language Model Agents

Language model agents are increasingly effective in solving realistic tasks through multi-turn tool use. However, training reliable tool-using agents remains challenging in practice. While reinforcement learning provides an on-policy paradigm for improving agents from their own environment interactions, its effectiveness depends heavily on the training task distribution. When tasks are fixed before training, the task distribution can become increasingly mismatched with the policy's evolving capabilities, causing many rollouts to be spent on uninformative tasks. We propose SENTINEL, a failure-driven reinforcement learning framework that turns the Solver's rollout failures into targeted training tasks. SENTINEL follows a Controller–Proposer–Solver loop: the Controller analyzes failed trajectories and summarizes recurring error patterns, the Proposer generates executable tasks that stress these weaknesses, and the Solver is trained on the targeted tasks. On Tau2-Bench Retail with Qwen3-4B-Thinking-2507, SENTINEL improves Pass\^{}1 from 66.4 to 74.9 and outperforms RL on general synthetic tasks across Pass\^{}k metrics. These results demonstrate that model failures provide an effective and scalable source of targeted training signal for improving tool-using language model agents.

08.
arXiv (CS.CV) 2026-06-12

SeamEdit: A Black-Box VLM-Agnostic Pipeline for Large-Image Semantic Editing

Semantic region editing for large images must satisfy two requirements at the same time: high generative quality and natural integration with surrounding content. Some related methods rely on white-box models and leave the strong generation capability of closed-source models underexplored. Directly applying closed-source models to tiled editing, however, introduces several failure modes: semantic deformation, canvas-level alignment drift, and visible seam artifacts. This paper presents SeamEdit, a training-free and model-agnostic pipeline that treats any VLM with inpainting capability as a black-box oracle. SeamEdit mitigates these issues through a five-stage post-hoc pipeline: overlay-based tile decomposition, black-box VLM inpainting, geometric and color-consistency correction, seam-risk-based multi-candidate ranking, and dynamic-programming curved seam fusion. The pipeline reduces seam visibility and supports semantic modification of arbitrary tile regions.

09.
arXiv (math.PR) 2026-06-16

A tree-free approach to 3D Yang-Mills Langevin dynamic. Analytic estimates and the existence of a model for a regularity structure

arXiv:2605.14616v2 Announce Type: replace Abstract: Using the multi-index approach to regularity structures due to F. Otto et al., we construct a regularity structure and a model for it associated to the stochastic Langevin equation for the 3D Euclidean Yang-Mills functional. For the model we also obtain global stochastic and global pointwise weighted Besov type estimates which hold almost surely. The model is defined as a limit of a sequence of smooth models introduced with the help of a mollified noise. When the mollification is removed the sequence converges in a certain topology defined with the help of the stochastic estimates. To obtain these results we develop the multi-index approach for systems of equations with vector-valued white noises. This project is motivated by the problem for constructing 3D Euclidean Yang-Mills measure and by the earlier results of the author on the related problem of canonical quantization of the Yang-Mills field on the Minkowski space.

10.
arXiv (CS.CV) 2026-06-12

An Extensible and Lightweight Unified Architecture for Demosaicing Pixel-bin Image Sensors

Pixel-bin image sensors are becoming the default choice for smartphone cameras due to their resolution vs light-gathering trade-off. However, their larger inter-color separation compared to the Bayer color filter array (CFA) makes them challenging to demosaic. Furthermore, existing deep learning-based demosaicing methods are CFA-specific, requiring multiple individual models that take up precious onboard resources and demand larger development and maintenance efforts. In this work, we propose a modular unified architecture for demosaicing various pixel-bin sensors that provides higher image quality while being extensible and lightweight. Additionally, to enable plug-and-play operation, we introduce a learning-free CFA-identification module to detect the CFA type of raw data accurately.

11.
arXiv (CS.LG) 2026-06-17

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

arXiv:2602.17894v2 Announce Type: replace-cross Abstract: Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities - for example, health markers, demographics, or political affiliations - and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size - the total sample size divided by $D_{\chi^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{\chi^2}$ is the $\chi^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.

12.
medRxiv (Medicine) 2026-06-18

Personalizing Suicide Risk Assessment: Machine Learning Extraction of Cross-Modal Interactions Between Psychosocial and Demographic Factors in Veterans

Background: Veterans face an elevated risk of suicide compared to the general population, motivating national efforts to develop predictive models that can guide proactive care. Current models used by the U.S. Department of Veterans Affairs (VA) rely primarily on structured electronic health record (EHR) data, though clinical notes contain rich contextual information that can be quantified using natural language processing (NLP) to derive psychosocial variables that may improve risk detection. Machine learning methods, particularly classification and regression trees (CART), can also uncover interactions between clinical and psychosocial variables, enabling identification of patient characteristics that modify suicide risk factors. However, integrating structured and unstructured data presents challenges because NLP features often greatly outnumber traditional clinical variables, potentially biasing interaction discovery. In prior work, we addressed this imbalance by introducing a weighted CART framework that balances structured variables with NLP-derived psychosocial features from semantic lexicons (SEANCE). While effective, semantic approaches summarize language into predefined constructs and may overlook important lexical variation present in clinical narratives. Methods: In this study, we extend that framework by replacing semantic features with a high-dimensional bag-of-words (BoW) representation of clinical notes and by evaluating models across cohorts defined by structured suicide risk stratification (low, medium, high) and varying temporal lookback windows. Using a cohort of 27,241 veterans, we analyzed clinical documentation collected up to 30, 90, or 270 days prior to death (or a matched index date for controls), enabling temporally flexible risk modeling. XGBoost models were trained to balance structured and unstructured features and identify cross-modal interactions between textual and clinical variables. Results: When incorporated into generalized linear models, these interactions improved predictive performance, particularly among low- and medium-risk patients, and substantially reduced the performance gap between interpretable and more complex models. Notably, the BoW representation outperformed our prior semantic index-based approach. Discussion and Conclusions: Together, these findings demonstrate the utility of interpretable NLP methods for uncovering clinically meaningful interactions between psychosocial and demographic factors in suicide risk and establish a strong benchmark for future deep learning approaches aimed at capturing richer contextual and temporal information from clinical narratives.

13.
arXiv (CS.CL) 2026-06-15

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

作者:

Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parametric knowledge (e.g., MMLU, GSM8K) and perplexity metrics degrades predictably with decreasing expansion ratios, instruction-following capabilities improve at the 2.4x equilibrium ratio (IFEval: +4.8 points / +46% in Llama-3.2-1B and +3.7 points / +39% in Llama-3.2-3B), and multi-step reasoning remains robust (MUSR). This pattern, observed consistently across both evaluated model sizes, challenges the prevailing assumption in compression research that pruning induces uniform degradation. To investigate this, we evaluated seven expansion ratio configurations using comprehensive benchmark suites that assess factual knowledge, mathematical reasoning, language comprehension, instruction-following, and truthfulness. Our analysis identifies the expansion ratio as a critical architectural parameter that selectively reshapes the model's task performance profile, rather than merely serving as a compression metric.

14.
arXiv (CS.CV) 2026-06-19

Geometry-Preserving in 3D Gaussian Splatting for LiDAR-Camera Extrinsic Calibration

Accurate LiDAR-camera calibration is essential for robust multi-modal perception. Targetless approaches avoid manual setup but remain limited by the scarcity of discriminative cross-modal features. Recent methods address this by reconstructing the scene within a differentiable model, enabling extrinsic optimization through dense photometric supervision. Among these, 3D Gaussian Splatting (3DGS) has been widely adopted as a geometric proxy that bridges LiDAR and camera within a single differentiable framework. However, since 3DGS was originally designed for novel view synthesis, existing methods tend to prioritize rendering quality, causing the proxy geometry to drift from the true LiDAR structure. We propose a framework that preserves the metric geometry of the Gaussian proxy by aggregating multi-view LiDAR observations for dense depth supervision and blocking photometric gradients from updating the Gaussian spatial parameters. We validate our method on public driving datasets, where it consistently outperforms existing targetless methods in calibration accuracy.

15.
Nature (Science) 2026-06-10

Hybrid refinery process turns plant material into industrially important chemical

An ingredient of nylon has been made in high yields from lignin — revealing a fresh strategy for turning this complex plant biopolymer into industrial chemicals. An ingredient of nylon has been made in high yields from lignin — revealing a fresh strategy for turning this complex plant biopolymer into industrial chemicals.

16.
arXiv (CS.CL) 2026-06-11

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an autonomous generation pipeline that instantiates this formalism from natural language descriptions. The pipeline comprises three modules: (1) a parser that extracts structured generation parameters from natural language input; (2) a generator that produces the task specification, tool interface, and scoring configuration; and (3) a validator that enforces feasibility, diversity, structural validity, and internal consistency across the generated environments. Using ClawEnvKit, we construct Auto-ClawEval, the first large-scale benchmark for claw-like agents, comprising 1,040 environments across 24 categories. Empirically, Auto-ClawEval matches or exceeds human-curated environments on coherence and clarity at 13,800x lower cost. Evaluated across 4 model families and 8 agent harness frameworks, we find that harness engineering boosts performance by up to 15.7 percentage points over a bare ReAct baseline, completion remains the primary axis of variation with no model saturating the benchmark, and automated generation enables evaluation at a scale previously infeasible. Beyond static benchmarking, ClawEnvKit enables live evaluation: users describe a desired capability in natural language and obtain a verified environment on demand, turning evaluation into a continuous, user-driven process. The same mechanism serves as an on-demand training environment generator, producing task distributions that adapt to an agent's current weaknesses rather than being bounded by existing user logs.

17.
arXiv (CS.CL) 2026-06-16

Context Compression Is Not One Thing: Readable Symbolic Re-expression vs. Coherent Summary at Matched Budget

We study context compression for multi-hop question answering with small language models. We propose Telegraph English, a readable symbolic format that rewrites retrieved passages into structured entity-relation statements, preserving reasoning evidence at lower token cost. In controlled experiments on MuSiQue, TwoWiki, and HotpotQA, Telegraph English outperforms three matched-budget compression baselines (character-level deletion, truncation, and random sub-sampling) on every dataset, with gains of 13 to 20 F1 percentage point. It also outperforms a coherent prose summary produced by the same encoder on the hardest dataset. A pre-registered depth-interaction hypothesis is null: the advantage does not grow with reasoning depth within datasets. We interpret these results as evidence that readable symbolic re-expression preserves entity content more densely than either natural language or coherent summarization at matched token budget.

18.
arXiv (CS.CV) 2026-06-12

CD-RCM: Generalizable Continuous-Depth Novel View Synthesis for Reflectance Confocal Microscopy

Reflectance confocal microscopy (RCM) provides noninvasive, cellular-resolution "optical biopsies" of human skin in vivo by acquiring en-face images at successive depths, forming a sparse z-stack. Due to optical limitations, these stacks are anisotropic 3D volumes with lateral resolution (0.5 $\mu$m) $\sim$6 times higher compared to axial resolution, which is defined by the optical sectioning (3 $\mu$m), limiting the interpretation of tissue. Our goal is to provide continuous-depth visualization by interpolating intermediate sections and making the 3D volume isotropic. Such a representation permits arbitrary-direction sectioning, including histopathology-like cross-sectional examination, without requiring per-patient optimization. To that end, we introduce the first RCM-specific novel-view synthesis (NVS) approach, CD-RCM, a feedforward model that predicts realistic, unseen depths from sparsely sampled RCM stacks. Classical neural rendering methods focus on reconstruction from surface-level multi-view observations. In contrast to surface-level camera views, RCM can acquire optically sectioned en-face images of tissue beyond the surface up to 200 $\mu$m. However, during visualization of the RCM stacks, observations of the shallower sections (towards the surface) obscure the deeper ones. This unique axial imaging geometry and layer-dependent anatomical organization motivated our development of a tailored architectural and training framework that explicitly accounts for RCM's depth-resolved, occlusive imaging physics. Experiments demonstrate that CD-RCM achieves high-fidelity novel-view synthesis with sub-second inference time.

19.
arXiv (quant-ph) 2026-06-15

Extensible Fluxonium Architecture Using Tunable Couplers with Low Shunt Capacitance

arXiv:2606.01647v2 Announce Type: replace Abstract: Fluxonium qubits have demonstrated high-fidelity operations and long coherence times in small-scale systems, highlighting their promise for quantum computing. However, large-scale integration into a high-performance two-dimensional (2D) qubit array remains the central challenge for practical applications. In this work, we introduce an extensible architecture for scaling up fluxonium qubits in 2D grids. To address the key challenges, namely achieving controllable strong interaction and high connectivity for qubits featuring small shunting capacitors (footprints), we propose using low-shunt-capacitance couplers to enable tunable interactions between fluxonium qubits. When embedded into 2D square lattices, large couplings can be achieved even with relatively small coupling capacitances, thus enabling multiple connections with sufficient capacitance budget. We further propose coupler realizations based on generalized flux qubit circuits, specifically the quarton and the fluxonium, and demonstrate that both enable fast, high-fidelity gates with low spectator errors, while supporting multiple connections on 2D grids.

20.
arXiv (CS.CV) 2026-06-16

Physics-Driven Zero-Shot MRI Reconstruction with Non-local Image Priors

Zero-Shot Self-Supervised Learning (ZS-SSL) has emerged as a promising paradigm for accelerated Magnetic Resonance Imaging (MRI) reconstruction, eliminating the reliance on fully-sampled external datasets. However, learning solely from a single under-sampled scan suffers from supervision scarcity and optimization instability, often leading to overfitting or artifacts. To address these challenges, we propose a robust physics-driven ZS-SSL framework that synergizes physical consistency with image-domain non-local priors. Our method introduces three core innovations: (1) a Coil Sensitivity Map (CSM)-Guided Dynamic Repository, which stabilizes the training trajectory by filtering physically inconsistent artifacts based on coil sensitivity constraints; (2) a SPIRiT-based regularization, which enforces k-space self-consistency via a learned correlation kernel and stochastic masking; (3) a Non-Local Self-Similarity (NSS) Pixel Bank, which leverages the high-fidelity reference established by the former modules to explicitly mine non-local anatomical similarities, thereby augmenting supervision in the image domain. Extensive experiments on the FastMRI dataset demonstrate that our approach achieves state-of-the-art performance, particularly under high acceleration factors, effectively bridging the gap between zero-shot learning and supervised methods. The code is available at https://github.com/Zolento/NS-SSL.

21.
arXiv (math.PR) 2026-06-17

Optional Stopping for Superhedging Supermartingales

arXiv:2606.17452v1 Announce Type: new Abstract: Superhedging supermartingales, introduced by the authors in previous work, are non-probabilistic processes defined via subadditive outer integrals that carry a purely financial interpretation in terms of superhedging cost. Building on the Leinert-König theory of non-lattice integration, the present paper establishes several results that are classical in probability theory but whose non-probabilistic proofs require fundamentally new arguments: (i) a tower inequality for the conditional outer integral \overline{\sigma}_j applied at stopping times, reducing to equality when the integrand is conditionally integrable; (ii) three versions of Doob's optional stopping theorem, organised by the class of supermartingale and the range of the stopping times; and (iii) Dubins' upcrossing inequality in both finite- and infinite-time horizons. A key structural result, property (K)-a.e., identifies conditions under which the two superhedging operators \overline{\sigma}_j and \overline{I}_j coincide on non-negative functions, extending the scope of all preceding results to the positive operator \overline{I}_j. None of the proofs invoke classical measure-theoretic tools; in particular, (classical) integrability and measurability are not assumed. The analogues of classical stochastic results acquire a purely financial interpretation and, in this way, gain depth and generality by providing a context that is independent of any a priori probabilistic structure.

22.
arXiv (quant-ph) 2026-06-17

Emergent de Sitter Space and Non-Unitary Tensor Networks from Non-Hermitian Quantum Criticality

arXiv:2606.17983v1 Announce Type: new Abstract: Extending the holographic principle to de Sitter (dS) spacetimes remains one of the most vital open frontiers in quantum gravity, where a microscopic, bottom-up tensor-network framework that relates boundary quantum data to emergent de Sitter spacetime is still lacking. In this work, we first show the emergence of de Sitter spacetime from boundary entanglement by formulating a non-unitary continuous multi-scale entanglement renormalization ansatz (cMERA) for a concrete non-Hermitian critical fermion chain. Within this emergent spacetime, we analyze the associated geodesics and show that they act as extremal Ryu-Takayanagi (RT) surfaces undergoing a smooth timelike-to-null transition. Remarkably, we demonstrate that this continuum trajectory dictates a distinct tensor-network architecture in which the bond-counting contribution naturally truncates at the discrete timelike-to-null transition toward the deep infrared. In the resulting architecture, the null ray along the horizon is represented by zero-cost links, since the associated cut severs no tensor legs. This network structure successfully reproduces the logarithmic scaling of non-unitary critical entanglement entropy, offering a bond-counting picture for the de Sitter RT formula. Our results provide the long-sought dS/(c)MERA correspondence at the level of both emergent spacetime and discrete holographic entanglement.

23.
arXiv (CS.AI) 2026-06-16

Discrete optimal transport is a strong audio adversarial attack

arXiv:2509.14959v3 Announce Type: replace-cross Abstract: In this paper, we investigate discrete optimal transport (DOT) as a black-box attack against modern automatic speaker verification (ASV) and anti-spoofing countermeasure (CM) systems. Our attack operates as a post-processing distribution-alignment step. Frame-level WavLM embeddings of generated speech (or another person speech) are aligned to an unpaired bona fide speech pool using entropic optimal transport and a top-k barycentric projection, followed by neural vocoding. Unlike gradient-based attacks, the proposed method requires no access to model parameters, gradients, or training data. Experiments on ASVspoof2019 and ASVspoof5 demonstrate that DOT attack substantially increases CM EER and substantially degrades ASV performance across multiple spoofing attacks. The attack transfers across datasets and remains effective after CM fine-tuning. Analysis using speaker similarity, Fréchet Audio Distance, and visualization of embedding distributions suggests that DOT succeeds by shifting source speech toward bona fide regions of the representation space rather than by maximizing speaker similarity. These results indicate that optimal-transport-based distribution alignment represents a previously underexplored attack vector for contemporary ASV and anti-spoofing systems.

24.
arXiv (CS.CL) 2026-06-12

Examining the Cognitive Gap Between Authors and Peer Reviewers on Academic Paper Novelty

Novelty is a crucial metric for assessing the quality of academic papers. Scholars strive to highlight the novel aspects of their work, particularly in the title, abstract, and introduction. Peer review, serving as the gatekeeper of scientific rigor, rigorously evaluates the novelty of papers, yet a cognitive gap may exist between author self-promotion and reviewer evaluation. To investigate this, we analyzed 15,328 academic papers published in Nature Communications from 2016 to 2021, along with their peer-review comments. We found that both reviewers and authors emphasize result-oriented innovation, with reviewers adopting a more comprehensive evaluation perspective. Furthermore, by examining promotional intensity against inherent paper novelty, we found that its effect depends on the paper's actual innovation level. Highly innovative papers benefit from stronger promotional language, receiving more positive evaluations. We also found that promotional language significantly correlates with reviewer disagreement on novelty specifically for papers of moderate innovativeness, whereas it has negligible impact for papers with either very high or very low novelty. This reveals how promotional language operates most prominently in the gray area of academic evaluation.

25.
arXiv (CS.CV) 2026-06-12

GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models

Vision-Language Models (VLMs) hallucinate objects that are not present, and a growing line of work tries to curb this by feeding the model its own generated caption as auxiliary evidence – assuming that a caption, once available, is something to consume. We show this fails: naively appending a caption can lower accuracy rather than raise it, dropping Qwen2.5-VL-3B$^\dagger$ on HallusionBench by nearly ten points. To understand why, we build GD-Probe, a diagnostic set that pairs a global and a detail question on the same image, so that any difference in caption effect is attributable to the question alone. Caption utility proves to be a per-query property: the same caption helps global questions and harms detail ones, through a single mechanism – an embedded caption competes with the image for attention and pulls the model's evidence onto its own text – whose sign is set by whether the caption covers the queried content. Crucially, this regime is readable from quantities the decoder already emits, with no attention access or grounding. We turn this into GEASS (Gated Evidence-Adaptive Selective Caption Trust), a training-free, logit-level module that decides per query how much of the caption to trust, gating it by the clean path's confidence, weighting it by the entropy reduction it induces, and raising the evidence bar when the two pathways disagree. Across four VLMs and two benchmarks (POPE and HallusionBench), GEASS improves over both vanilla inference and contrastive decoding under a single fixed setting, adding only two forward passes and no parameters.