Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-19

ReA-OVCD: Reliability-Aware Open-Vocabulary Change Detection via Semantic and Spatial Refinement

Unlike traditional remote sensing change detection that relies on predefined categories, Open-Vocabulary Change Detection (OVCD) identifies land cover changes flexibly using arbitrary text prompts. However, existing methods suffer from an inherent trade-off when modeling changes: instance-level comparison overlooks fine-grained semantic variations (e.g., partial building extensions), while direct pixel comparison proves unreliable, yielding unstable responses and boundary artifacts due to semantic ambiguity and spatial inconsistency. To this end, we propose an efficient training-free Reliability-Aware Open-Vocabulary Change Detection (ReA-OVCD) framework. It first derives candidate change regions from pixel-wise semantic discrepancies to ensure flexible and detailed localization. To ensure reliability, it subsequently introduces a collaborative refinement strategy to explicitly model change validity from both semantic and spatial perspectives. Specifically, we develop a Semantic Change Reasoning (SCR) module that reassesses changes by jointly analyzing distributional divergence and response variation, enabling the suppression of incidental inconsistencies while preserving reliable semantic shifts. In addition, a Boundary-aware Change Refinement (BCR) module is designed to mitigate artifacts stemming from boundary misalignment and uncertainty through validating whether candidate regions are supported by reliable interior pixels. Extensive experiments across multiple datasets (LEVIR-CD, WHU-CD, DSIFN, and SECOND) demonstrate that our method consistently outperforms state-of-the-art approaches, achieving $\mathrm{F}_{1}^{C}$ improvements of 2.13\% to 9.75\% with higher computational efficiency. The code is publicly available at \https://github.com/Funny0101/ReA-OVCD

02.
arXiv (CS.AI) 2026-06-19

FFinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

arXiv:2606.19887v1 Announce Type: cross Abstract: Existing safety benchmarks target general adversarial scenarios but miss finance-specific risks. Financial LLMs face regulatory compliance violations, fraud facilitation, and systemic trust erosion that require targeted evaluation. We introduce FinRED, an expert-guided red-teaming framework for financial LLM safety evaluation developed with financial experts. FinRED uses a novel two-level taxonomy mapping global standards (e.g., FATF and EU DORA) to threats ranging from regulatory evasion to complex fraud, integrated with a scalable pipeline that converts real financial documents into context-rich red-teaming Behavioral Prompts (seeds) through an expert-defined schema. Rigorous expert validation confirms seed plausibility and realism for meaningful LLM safety evaluation. We also provide an expert-validated, finance-specific rubric that goes beyond disclaimer checks, aligns more closely with human experts than static one-size-fits-all rubrics, and reduces critical false negatives from 28 to 12. Aligned with internationally adopted risk-management and information-security standards (e.g., ISO/IEC 27001), FinRED is deployed in South Korea's Financial Security Institute (FSI) regulatory sandbox for generative AI security evaluation in real financial services. To mitigate dual-use risks, the dataset, generation pipeline, prompt template, and evaluation framework are gated for qualified researchers at https://github.com/selectstar-ai/FinRED-paper and https://huggingface.co/datasets/datumo/FinRED.

03.
arXiv (CS.CL) 2026-06-15

Characterizing Cultural Localization in AI-Generated Stories

The global use of artificial intelligence has increased interest in assessing the ability to generate culturally localized content, including stories. Cultural localization in stories often occurs through either templated localization – the use of cultural markers (e.g., names, locations) in a generic narrative – or holistic localization – the variation of plots, values, and themes, in addition to cultural markers. We propose a method to measure the degree to which content was generated through templated localization. Specifically, we identify the lexical tokens that distinguish stories across nationalities and measure the similarity of the narratives that remain after removing them. In stories generated by five models on 125 topics for 193 nationalities, our method is able to detect that only a small subset (9-17%) of the vocabulary accounts for the variation across nationalities and that the narratives that remain after removing them contain repeated multi-word sequences, suggesting the presence of a shared culturally-agnostic narrative template. Finally, we characterize the cultural markers for their stereotypicality and offensiveness, finding that markers from 19 countries, mostly located in the Global South, are on average offensive.

04.
arXiv (math.PR) 2026-06-25

Information from coincidences

arXiv:2606.25042v1 Announce Type: cross Abstract: We prove a single algebraic mixed coincidence identity that unifies a broad swath of information-theoretic variational results. For any family of priors $\{\pi_i\}$ and real exponents $\{ \alpha_i \}$, the log of the mixed count $E_{x\sim\nu}\!\left[\prod_{i=1}^W \pi_i^{\alpha_i}(x)\right]$ is simultaneously a Boltzmann coincidence weight, an exponential-family normalizer, a maximum-entropy value, and a KL-barycenter optimum. The identity yields a unified derivation of classical cornerstones of information theory: concentration of empirical distributions (Sanov-type decompositions and Gibbs conditioning), hypothesis-testing error exponents (Chernoff information and its multi-way analogue), change-of-measure inequalities (Donsker-Varadhan and PAC-Bayes), and laws governing rare-pattern coincidences (Erdos-Renyi run-length, iterative guesswork, rate-distortion, and birthday thresholds). Each is recovered as a specialization of the same algebraic equality. It strictly generalizes the classical Renyi entropy and divergence variational formulas (one and two priors respectively) to a $W$-prior simplex, and holds for unnormalized and continuum-indexed priors. Among its consequences are an exact multi-prior PAC-Bayes penalty that subtracts an explicit "coincidence bonus" from the usual single-prior posterior penalty, and the asymptotic MAP error exponent for $W$-ary hypothesis testing as an edge-restricted simplex optimum. We demonstrate the calculus at scale on two large alphabets encoding richly modeled sequential languages: on language-model next-token predictives where we recover contrastive decoding, and on human genomic regulatory sequence where it separates correlated from diverse prior families along a sliding-window trace.

05.
Nature (Science) 2026-06-10

Improved quantum processor logical error rates via correction and detection

作者:

Performing quantum algorithms for critical problems in physics and chemistry requires substantially lower error rates than the physical error rates of present quantum computers. Achieving such low logical error rates requires quantum error correction1,2 and physical error rates below a critical threshold value3–8. We experimentally demonstrate on a trapped-ion quantum charge-coupled device (QCCD)9,10 improvements in logical error rates ranging from 11× to 800× compared with several physical circuit baselines, including quantum computation on multiple qubits. Our results hinge on two quantum error correction code constructions optimized for an ion-trap processor: a 12-qubit code encoding two qubits inspired by Knill11 and a 16-qubit tesseract colour code encoding four qubits12,13. These constructions are combined with a scalable method of error detection and post-selection to achieve reduced logical error rates. Our results show that state-of-the-art quantum devices are already able to make use of fault tolerance and error correction to strongly suppress errors in non-trivial quantum circuit computations. Experimental demonstration of quantum error-correcting codes combined with error detection and post-selection applied to a trapped-ion quantum processor shows improvements in logical error rates ranging from 11× to 800× compared with several physical circuit baselines.

06.
arXiv (CS.CV) 2026-06-18

Pyramid Self-Contrastive Learning for Single-shot Test-time Ultrasound Image Denoising

The inherent electronic and speckle noise complicates clinical interpretation of ultrasound images. Conventional denoising methods rely on explicit noise assumptions whose validity diminishes under composite noise conditions. Learning-based methods are usually pretrained in a limited image domain using a labeled dataset, which implies inevitable domain shift in complex in vivo environments. This study proposes a Pyramid Self-Contrastive Learning (PSCL) framework for test-time ultrasound image denoising without pretraining. Given multiple noisy samples from only one-shot imaging, PSCL disentangles anatomical similarity and noise randomness into separate pyramid latent spaces. The clean image is then decoded from the anatomy space while discarding the noise space. We first apply PSCL to synthetic aperture ultrasound (SAU), where an Aperture-to-Aperture loop serves as a self-supervised proxy task to ensure denoising fidelity. Simulation experiments, including noise levels from 0 to 30 dB and inclusion geometries from simple to complex, demonstrated improvements of 69.3% in SNR and 34.4% in CNR. The in vivo results showed 84.8% SNR and 25.7% CNR gains using only two aperture data of the heart in six echocardiographic views, liver, and kidney. PSCL delivers clear images across diverse imaging targets and configurations, paving the way for more reliable anatomical visualization without domain shift and pretraining costs.

07.
arXiv (CS.LG) 2026-06-15

Scalable Deep Unfolding of Conic Optimizers

arXiv:2606.13825v1 Announce Type: cross Abstract: Deep unfolding (DU) accelerates iterative optimizers by introducing learnable components and training them through unrolled iterations, but extending DU to the large-scale semidefinite programs (SDPs) common in robotics has remained limited. Unrolling a full-update conic solver such as COSMO exposes two obstacles that prior work on learned conic solvers has not: backpropagating through the per-iteration linear-system solve incurs memory quadratic in the problem size once the coefficient matrix is formed explicitly, and backpropagating through the positive semidefinite (PSD) cone projection becomes numerically unstable when eigenvalues coincide. We address the first obstacle with a matrix-free implicit differentiation rule that operates entirely through matrix-vector products, reducing memory from $O(n^2)$ to $O(n)$ and enabling backpropagation at scales where direct factorization runs out of memory. We address the second with a backward rule based on the Dalečkii–Krein representation of the Fréchet derivative, which remains well-defined under repeated eigenvalues. Together these make it possible to learn lightweight hyperparameter policies and warm-starts for a full-update conic solver. We evaluate on nonlinear covariance steering problems solved via sequential convex programming (SCP), as well as standalone SDPs and second-order cone programs ranging from max-cut and Lovász $\vartheta$ SDPs to robust estimation and control problems. The learned policies outperform state-of-the-art solvers across all problems, and can provide up to a 50$\times$ speedup depending on the class. When used as a subroutine in SCP, the learned approach delivers over a 30$\times$ speedup compared to COSMO.

08.
arXiv (CS.AI) 2026-06-19

QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation

arXiv:2606.20227v1 Announce Type: new Abstract: Large Language Models (LLMs) have made significant progress in reasoning, particularly in deductive reasoning, which is crucial for high-stakes decision-making. As models improve, evaluation benchmarks should evolve to keep pace. However, existing benchmarks lack fine-grained control over logical complexity and struggle to balance semantic diversity with logical consistency. To address these issues, we propose QMFOL, an automated framework for generating monadic first-order logic reasoning tasks with quantifiable and controllable complexity. It constructs formal logical structures using conjunction and disjunction patterns, enabling precise control over reasoning depth, width, label types, and distractors. These structures are then translated into natural language via LLMs, with logical consistency ensured through round-trip verification using an external prover. Based on our framework, we build QMFOLBench, a benchmark comprising 2880 instances with 960 configurations across diverse logical and semantic dimensions. Evaluations on six large reasoning models (LRMs) and two LLMs show that performance degrades and computational overhead increases with rising logical complexity. Models perform better on True-labeled tasks than on False or Unknown ones, and exhibit sensitivity to semantic variation. Overall, QMFOL offers a scalable and reliable approach for constructing deductive reasoning benchmarks with controllable complexity, enabling more precise evaluation of reasoning capabilities in modern language models.

09.
arXiv (CS.CL) 2026-06-16

ACCORD: Action-Conditioned Contextual Grounding for Language Agents

User instructions are often underspecified because humans rely on implicit assumptions about the surrounding environment. For large language model (LLM) agents operating in information-rich digital and physical environments, these assumptions cannot be inferred from the instruction alone; they must be recovered from the current state of tools, data, interfaces, and observations. Effective execution therefore requires agents to identify missing context, ground it in observed evidence, and carry it forward into subsequent actions. We show that current agents often fail to do so. They act from assumed rather than observed specifics, overlook information they could have gathered, and fail to incorporate evidence that has already been returned. Building on this insight, we propose ACCORD (Action-Conditioned Contextual Grounding), a simple and effective agent framework for adaptive grounding. Before each action, ACCORD actively probes the environment for missing information and integrates relevant context from the agent's trajectory that would otherwise be overlooked. Requiring no additional training or task-success signals, ACCORD improves task-goal completion on AppWorld by up to +20.6 points with GPT-5-mini, from 42.0% to 62.6%, compared to strong baselines. These gains persist with a substantially stronger base model (+10.8 with Claude-4.5-sonnet), an open-weight model (+10.1 with Qwen3.5-27B-FP8), and on the embodied AlfWorld benchmark (+7.4 success rate with GPT-5-mini).

10.
arXiv (CS.CL) 2026-06-12

From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?

A goal of interpretability is to recover disentangled representations of latent concepts (features) from the activations of neural networks. The quality of features is typically evaluated in isolation, and under implicit independence assumptions that may not hold in practice. Thus, it is unclear to what extent common featurization methods such as sparse autoencoders (SAEs) and probes disentangle one concept from another. We propose a multi-concept evaluation setting using concepts including sentiment, domain, voice, and tense. We evaluate how well featurizers produce disentangled representations of each concept, observing that features are typically sensitive to only one concept, but also that concepts are distributed across many features. Then, we steer these features, measuring whether each concept is independently manipulable, and whether features interact. Even in idealized settings, steering a feature often affects many concepts, despite a near absence of interaction effects. These results suggest that correlational metrics are insufficient to establish steering selectivity, and that demonstrating that two features operate in separate spaces is insufficient to claim that they will be selective for one concept. These results underscore the importance of multi-concept evaluations in interpretability research.

12.
arXiv (quant-ph) 2026-06-11

Q-DICE: Quantum Distributed Interconnect Compiler and Emulator

arXiv:2606.11340v1 Announce Type: new Abstract: As distributed quantum computing (DQC) offers a leading path towards scalable quantum computation, the ability to benchmark distributed algorithms under realistic conditions becomes critical for system co-design. However, without access to physical systems, researchers lack tools to evaluate distribution protocols. We introduce Q-DICE (Quantum Distributed Interconnect Compiler and Emulator), a hardware-aware emulation environment for benchmarking distributed quantum circuits on classical simulators and on NISQ-era monolithic hardware. This work provides three core contributions: (1) a programmatic scheme to construct distributed QPU backends, utilizing two novel techniques - QPU slicing and stitching - to facilitate distributed circuit mapping, (2) a methodology for modeling nonlocal link noise using physically motivated Kraus operators and stochastic error channels, and (3) a boundary-aware circuit mapping algorithm enforcing distributed QPU topology constraints during transpilation. Together, these components constitute a distribution-aware compiler and noise-modeling engine that faithfully enforces the physical limitations of distributed quantum hardware within existing execution environments. We validate Q-DICE against a multitude of experimentally demonstrated quantum circuits, including a distributed Grover's search on optically linked trapped-ion hardware, achieving a worst-case fidelity deviation of 4% between simulated and experimental results. These findings demonstrate Q-DICE's capacity to accurately reproduce real distributed quantum system behavior across platforms, streamlining experimentation with distributed quantum algorithms and architectures.

13.
arXiv (CS.AI) 2026-06-25

TL++: Accuracy and Privacy Preserving Traversal Learning for Distributed Intelligent Systems

arXiv:2606.25627v1 Announce Type: cross Abstract: Distributed intelligent systems increasingly need to train across data silos without centralizing raw data. Federated learning keeps data local but can suffer under heterogeneous partitions and requires repeated full-model exchange. Split learning reduces communication through cut-layer activations, but standard protocols generally do not recover centralized mini-batch gradient behavior and may expose activations and gradients in plaintext. We present TL++, a two-mode traversal-learning framework that constructs virtual batches across nodes to recover centralized mini-batch gradient behavior under explicit synchronization assumptions. Base mode exchanges cut-layer activations and gradients rather than full models. Secure mode secret-shares each cut-layer activation and gradient between an orchestrator and a non-colluding helper, preventing either server from observing plaintext cut-layer tensors. This protection is limited to a semi-honest two-server setting; labels and loss-related outputs remain visible to the orchestrator. In the lightweight secure path evaluated here, exactness requires a linear or affine server path, while nonlinear operations require nonlinear MPC or approximation. We formalize TL++, analyze communication and computation costs, and evaluate it against federated and split-learning baselines on CIFAR-10 and BioGPT/PubMedQA using full fine-tuning and LoRA. On CIFAR-10, TL++ base cut 1 and exact secure cut 3 achieve accuracies of 91.41% (SD 0.19) and 90.93% (SD 0.17), respectively, exceeding the strongest measured non-TL++ baseline by more than 12 percentage points. TL++ base cut 1 also reduces per-step communication by 13.1-fold relative to full-model synchronization. PubMedQA results similarly favor TL++. Overall, TL++ approaches centralized-training performance while reducing communication and providing activation-level secret sharing.

14.
arXiv (CS.LG) 2026-06-24

New Bounds for the Last Iterate of the Stochastic subGradient Method

arXiv:2606.24879v1 Announce Type: cross Abstract: We study the last iterate of the stochastic subgradient method for one-dimensional convex Lipschitz objectives. For a fixed horizon $n$, we consider the standard fixed stepsizes $\eta =\Theta(1/\sqrt n)$. We prove that, for such stepsize policies, under additive i.i.d. subgradient noise with uniformly bounded variance, the last iterate features an optimization error of order $1/\sqrt n$, thereby removing the extra $(\log n)$ factor present in existing generic bounds. On the other hand, we show that without the i.i.d. assumption, the optimization error can be of order $(\log n)/\sqrt n$. Thus, under the uniformly bounded variance assumption alone, the last iterate of SsGM is suboptimal even in dimension one, resolving negatively an open problem posed in Koren and Segal, COLT, 2020.

15.
arXiv (math.PR) 2026-06-17

How long does it take to train an Elephant Random Walk

作者:

arXiv:2509.15049v2 Announce Type: replace Abstract: We study how conditioning on the first $k$ steps, which we think of as training, affects the long-term behavior of the Elephant Random Walk. When the elephant is conditioned to be at position $k$ at time $k$, the first return time to the origin scales as $k^{(4-4p)/(3-4p)}$ in the diffusive regime, and grows exponentially in the critical regime. We loosely interpret this as a measurement of the rate at which the elephant forgets its training.

16.
arXiv (CS.AI) 2026-06-17

Mental Health AI Safety Claims Must Preserve Temporal Evidence

arXiv:2605.08827v2 Announce Type: replace Abstract: The safety of mental health AI is often judged at the wrong temporal scale. Current evaluations typically score isolated responses, endpoint outcomes, or aggregate dialogue quality, while clinically consequential failures may arise from the order and accumulation of interactions themselves, including delayed escalation, repeated reinforcement, dependency formation, failed repair, and gradual deterioration across turns. This paper argues that this mismatch is not merely a limitation of evaluation coverage but a source of invalid safety conclusions. We introduce Temporal Safety Non-Identifiability, a formal account of why safety properties that depend on sequence, timing, accumulation, or recovery cannot be certified by protocols that discard those features. From this formalization, we develop SCOPE (Safety Claims Over Preserved Evidence) as a general principle for aligning safety claims with the evidence an evaluation actually retains, and instantiate it as SCOPE-MH, a mental-health instantiation of this reporting standard. We operationalize SCOPE-MH through a proof-of-concept on the AnnoMI dataset of expert-annotated motivational interviewing conversations, which reveals mechanisms of failure that per-turn behavior scoring does not represent. We propose SCOPE-MH as a diagnostic complement to existing evaluation infrastructure and argue that evaluation preserving temporal evidence is necessary, not optional, for safety-critical mental health AI deployment.

17.
arXiv (math.PR) 2026-06-11

Convergence of a Critical Multitype Bellman–Harris Process with One Infinite-Mean Lifetime

arXiv:2606.11511v1 Announce Type: new Abstract: We study a critical multitype Bellman–Harris branching particle system in $\mathbb R^N$ with a finite type space $\mathbb K=\{1,\dots,K\}$. Particles of type $i$ move according to a symmetric $\alpha_i$-stable process and reproduce according to a critical offspring law whose mean matrix is irreducible and stochastic. The lifetime distribution of type $1$ is assumed to have infinite mean with regularly varying tail $$ 1-F_1(t)\sim c_1t^{-\gamma},\, 0 \frac{\gamma}{\beta}, $$ and a local increment condition on the heavy lifetime distribution, we prove convergence of the system to a Poisson random measure concentrated on the infinite-mean type.

18.
arXiv (CS.CV) 2026-06-12

An Improved Generative Adversarial Network for Micro-Resistivity Imaging Logging Restoration

An improved GAN-based imaging logging image restoration method is presented in this paper for solving the problem of partially missing micro-resistivity imaging logging images. The method uses FCN as the generative network infrastructure and adds a depth-separable convolutional residual block to learn and retain more effective pixel and semantic information; an Inception module is added to increase the multi-scale perceptual field of the network and reduce the number of parameters in the network; and a multi-scale feature extraction module and a spatial attention residual block are added to combine the channel attention. The multi-scale module adds a multi-scale feature extraction module and a spatial attention residual block, which combine the channel attention mechanism and the residual block to achieve multi-scale feature extraction. The global discriminative network and the local discriminative network are designed to gradually improve the content and semantic structure coherence between the restored parts and the whole image by playing off each other and the generative network. According to the experimental results, the average structural similarity measure of the five sets of imaged logging images with different sizes of missing regions in the test set is 0.903, which is an improvement of about 0.3 compared with other similar methods. It is shown that the method in this study can be used for the restoration of micro-resistivity imaging log images with good improvement in semantic structural coherence and texture details, thus providing a new deep learning method to ensure the smooth advancement of the subsequent interpretation of micro-resistivity imaging log images.

19.
arXiv (CS.CV) 2026-06-16

V2P-Manip: Learning Dexterous Manipulation from Monocular Human Videos

Achieving autonomous robotic dexterous manipulation requires precise, human-like action sequences at scale. As a scalable supplement to costly teleoperation data, extracting trajectories with both visual fidelity and physical plausibility from monocular videos represents a promising frontier in embodied AI. To this end, we introduce V2P-Manip, an efficient framework designed to learn dexterous manipulation policies directly from human demonstration videos. We establish an efficient, integrated pipeline encompassing 3D asset acquisition, trajectory estimation, and dexterous policy learning. To bridge the gap between visual perception and physical constraints, we introduce a two-stage refinement process to enforce spatial alignment and physical consistency. Evaluations on the TACO and OakInk benchmarks demonstrate that our approach significantly outperforms previous methods in pose accuracy, adaptability to unstructured environments, and training efficiency. Ultimately, experimental results confirm an average success rate of over 75% across multiple synthetic manipulation tasks and validate the adaptability of the extracted manipulation priors across diverse dexterous hand embodiments.

20.
arXiv (CS.LG) 2026-06-25

Low-Cost High-Order Singular Value Decomposition for Tensor-Based Reconstruction from Sparse Sensor Measurements: Urban Flow and Air-Quality Applications

arXiv:2606.24989v1 Announce Type: new Abstract: Urban flow and air-quality simulations generate high-dimensional datasets describing velocity and pollutant transport across multiple spatial, temporal, and physical-variable dimensions. Reconstructing these fields from sparse sensor measurements is a fundamental challenge in environmental monitoring, digital twins, forecasting, and data assimilation. Existing low-cost reconstruction approaches are commonly based on matrix decompositions, which require multidimensional datasets to be flattened into two-dimensional snapshot matrices, thereby discarding important structural information. This work introduces the low-cost High-Order Singular Value Decomposition (lcHOSVD), a novel tensor-based sparse-sensing reconstruction framework for high-dimensional environmental fields. To the authors' knowledge, this is the first methodology that combines sparse sensing and HOSVD for field reconstruction. Unlike matrix-based approaches, lcHOSVD preserves the natural tensor structure of the data, enabling the exploitation of correlations across spatial, temporal, and physical-variable dimensions while substantially reducing the computational requirements of conventional HOSVD. The methodology is applied to urban flow and air-quality datasets, where three-dimensional velocity and pollutant concentration fields are reconstructed using only 1-4% of the available spatial locations. While lcSVD provides larger computational speed-ups, lcHOSVD consistently achieves lower reconstruction errors in configurations characterized by strong multidimensional coupling and heterogeneous dynamics across dimensions. Additional sensor-anisotropy analyses demonstrate that the tensor formulation is significantly more robust to uneven sensor distributions, a common situation in practical environmental monitoring networks.

21.
arXiv (CS.CL) 2026-06-15

Benchmarking Web Agent Safety under E-commerce Deceptive Interfaces

As autonomous web agents are increasingly deployed to perform real-world tasks, ensuring their safety has become a critical concern. In this work, we study web agent behavior under realistic deceptive interfaces in the e-commerce domain. We introduce WebDecept, a lightweight and configurable plugin framework that enables controlled injection of deceptive interface patterns into existing web environments. Using WebDecept, we instantiate seven deceptive patterns commonly observed on the open web, including targeted advertisements, domain redirection, and shopping manipulation. By injecting these patterns into the frontend during task execution, we perform controlled evaluation of multiple multimodal web agents. Our results show that current web agents are highly susceptible to multiple classes of deceptive interfaces, and that prompt-based constraints are often insufficient to mitigate these failures. We further analyze how the design choices of deceptive patterns influence the success of such manipulations. These findings highlight safety challenges that should be addressed as web agents are scaled toward real-world deployment.

22.
arXiv (CS.LG) 2026-06-12

Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models

arXiv:2606.12687v1 Announce Type: new Abstract: Marketing mix models are used to forecast business outcomes and to attribute those outcomes to marketing channels, but these goals are not equivalent. We study a failure mode in graph-based neural MMM called attribution bypass: a high-capacity decoder can obtain low forecasting error through target autoregression, dense communication, co-movement, context, or latent memory while failing to route counterfactual sensitivity through the graph used as the attribution object. We introduce DICE-MMM as a bounded diagnostic and training framework. We do not claim that observational neural MMM identifies causal effects. Instead, DICE separates three questions often conflated in graph-based MMM: graph recovery, forecasting accuracy, and whether the trained decoder's perturbation-induced influence is graph aligned. Stage 1 trains a graph encoder with a restricted graph-mediated decoder. Stage 2 freezes the selected encoder and trains a graph-safe latent decoder whose cross-node communication must pass through the supplied graph. Decoder use is evaluated with CIG, AR-CIG, and graph-swap tests. Across controlled R/d/T swaps and an external multi-graph rawlog stress test, DICE improves stable graph recovery over CausalMMM. The experiments show that forecasting accuracy is not an attribution certificate: in a sparse-target benchmark, no-graph and full-graph decoders achieve MSE@7 around 0.004 while AR-CIG nAUPRC remains near or below zero, whereas an oracle graph reaches 0.807 +/- 0.129 at comparable MSE. Frozen graph-swap localizes the bottleneck: the same DICE-hard-trained decoder moves from nAUPRC -0.044 +/- 0.006 under learned graph inputs to 0.894 +/- 0.027 with the oracle graph. The contribution is a stress test and failure-localization framework showing that low MSE can hide attribution bypass and that the unresolved bottleneck is graph-support selection, not forecasting or decoder capacity.

23.
arXiv (CS.CL) 2026-06-16

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

With the rapid progress of speech language models (SLMs), discrete speech tokens have emerged as a core interface between speech and text, enabling unified modeling across modalities. Recent speech tokenization approaches aim to isolate semantic information from low-level acoustics to better align with language models (LMs). In particular, previous methods use self-supervised learning (SSL) teachers such as HuBERT to extract semantic representations, which are then distilled into a semantic quantizer to suppress acoustic redundancy as well as capture content-related latent structures. However, these tokenizers often operate at relatively high frame rates, producing token sequences significantly longer than their textual counterparts and hindering seamless integration with pretrained LMs. Although recent methods attempt to reduce the token rate by applying uniform average pooling to SSL features, this can over-smooth content-bearing regions and dilute the structural information, thereby potentially limiting the LM alignment. To address this, we propose LM-SPT, an LM-aligned speech tokenization method based on semantic speech-resynthesis distillation. Instead of directly matching teacher and student features via pooling, LM-SPT resynthesizes speech from semantic tokens only and minimizes the discrepancy between representations extracted from the original and resynthesized waveforms using a frozen, LM-aligned speech encoder. This indirect supervision avoids rigid temporal alignment and encourages dedicated semantic units that are more semantically aligned with LMs under reduced frame rates. Experimental results show that the proposed LM-SPT consistently outperforms previous semantic-enhanced speech tokenizers when applied to SLMs for the tasks of automatic speech recognition and text-to-speech, even without compromising the speech reconstruction fidelity at the codec level.

24.
arXiv (CS.CV) 2026-06-24

A Dual Edge Spatial Jacobian Image Graph for Interpretable Diabetic Retinopathy Grading

Automated diabetic retinopathy (DR) grading from colour fundus photographs can achieve strong predictive performance, but clinical interpretation requires more than an image-level label. It requires understanding how lesion evidence is distributed around retinal vessels and how this evidence relates to quantitative vascular biomarkers. We present a dual-edge spatial-Jacobian image graph for interpretable DR grading. Each fundus image is represented as a graph node with four aligned evidence streams: AutoMorph vessel information ($X_1$), DR-XAI-style lesion evidence maps ($X_2$), a 128-dimensional lesion-based contrastive image embedding ($X_3$), and AutoMorph morphometric biomarkers ($X_4$). The spatial edge branch ($X_{12}$) encodes vessel-lesion geometry, while the Jacobian branch ($X_{34}$) models embedding-biomarker sensitivity. Lightweight two-token attention fuses both edge families into a final image graph. On 2,910 matched non-augmented APTOS images, the full graph achieves 0.8076 accuracy, 0.8312 quadratic weighted kappa, 0.5915 macro-F1, and 0.9330 adjacent-grade accuracy; referable DR reaches 0.9055 accuracy and 0.9711 AUROC. The framework is positioned as an explainable representation-learning tool for lesion-biomarker hypothesis generation, rather than as a deployment-ready clinical classifier. The code is available at https://github.com/Inamullah-Colab/dual-edge-dr-graph-xai.

25.
arXiv (CS.LG) 2026-06-17

Regularized Machine Learning for System Identification of Ship Free-Running Manoeuvres from CFD-Based Synthetic Data: A Comparative Study

arXiv:2606.17121v1 Announce Type: cross Abstract: This study investigates supervised machine learning techniques for identifying ship hydrodynamic coefficients from CFD-generated data from free-running simulations. Specifically, ordinary least squares and regularized regression methods are applied to Abkowitz-type manoeuvring models. Training and validation datasets are derived from URANS simulations of zig-zag and turning circle manoeuvres, which are validated against experimental benchmark data. The analysis evaluates the effects of coefficient set size, minimum training length required for predictive model training, and manoeuvre combinations on model performance. Results demonstrate the suitability of large-angle zig-zag manoeuvres for hydrodynamic system identification, provided that multicollinearity is addressed through appropriate coefficient selection, regression models, or input data variability. Larger coefficient sets offer greater model flexibility for variable conditions but are more prone to multicollinearity. Regularized regression techniques effectively mitigate multicollinearity and notably enhance prediction accuracy, as does incorporating more diverse manoeuvring data. Among tested models, Ridge regression provided the best compromise between computational efficiency and prediction accuracy.