Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-17

Theoretical Grounding of Out-Of-Distribution Detection With Reinforcement Learning Optimizer

Out-of-distribution (OOD) detection in dynamic open-world environments requires a model to continually adapt to evolving data distributions while generalizing to covariate-shifted inputs and rejecting semantic-shifted OOD examples. Most existing OOD detection methods optimize only the current-step objective and do not explicitly account for how post-deployment environment changes affect future OOD behavior. In this paper, we establish a theoretical grounding for dynamic OOD detection using a reinforcement learning (RL)-guided optimizer that explicitly favors updates that reduce the semantic OOD false positive rate over time. We develop a novel augmented optimizer that uses an RL-guided correction term on top of standard gradient descent (GD) and show its improvement over both future-domain generalization and semantic-OOD rejection. We analyze temporal error decomposition in terms of model-change and environment-change generalization errors and develop a new theoretical framework for comparing the generalization errors under both GD and RL-guided optimizers.

02.
arXiv (quant-ph) 2026-06-15

Implementation of two-qubit Rydberg operations on neutral Rb-87 atoms in systems with different intermediate states

arXiv:2606.13975v1 Announce Type: new Abstract: This work presents an experimental setup for implementing two-qubit operations on neutral atoms ($^{87}$Rb) with the possibility of using two different Rydberg excitation schemes. One of them uses 5P$_{1/2}$ as the intermediate level and applies the second-stage beam locally to the addressed atoms. The second scheme uses the 6P$_{3/2}$ level; in this scheme, the particles to be entangled are moved to a separate zone through which both Rydberg beams pass. The advantages and limitations of both schemes are analyzed. Based on numerical modeling performed with a Julia package developed by the authors, it is demonstrated that the spatial configuration has a greater effect on quantum-operation fidelity than the choice of intermediate level. An experimental implementation of the scheme using the 6P$_{3/2}$ level is demonstrated, making it possible to achieve a two-qubit operation fidelity of 94%.

03.
arXiv (CS.AI) 2026-06-12

(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

arXiv:2606.12848v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for tasks once reserved for trained researchers, including hypothesis generation, specification choice, and drafting conclusions. We argue that the reliability of AI-assisted research depends not only on model capability, but also on how cognitive labour is structured between humans and machines. We study this problem through Human-in-the-Loop Economic Research (HLER), a decision architecture based on pre-commitment, decision sequencing, accountability, and attention allocation. In a pre-specified 2*4 factorial experiment with 280 complete research runs across four datasets, an unconstrained multi-agent baseline produced critical failures in 72% of runs. Using the same underlying model, the same agent decomposition, and identical prompts for the shared reasoning agents, HLER reduced the failure rate to 16% by imposing three architectural commitments: LLMs reason but do not execute data work, data and estimation are handled deterministically, and three human decision gates bind the workflow. Fisher's exact test rejects equality of failure rates at p

04.
arXiv (CS.CL) 2026-06-16

Retrievable Gradients: Continual Post-Training Without Cumulative Weight Drift

Continual post-training enables models to absorb emerging knowledge after deployment, but repeatedly updating shared parameters can accumulate weight drift, potentially causing catastrophic forgetting and degrading general capabilities. Retrieval-augmented generation avoids such parameter drift, yet often lacks the depth of parametric knowledge integration. In this paper, we propose ReGrad (Retrievable Gradients), a new paradigm that treats gradients as retrievable units of knowledge. ReGrad pre-computes document-specific gradients offline, stores them in an indexed Gradient Bank, and retrieves only query-relevant gradients at inference time for temporary weight adaptation. However, raw language-modeling gradients are optimized for token-level document reconstruction rather than for query-driven knowledge use. We therefore introduce a bi-level meta-learning objective that reshapes document-derived gradients into generalizable adaptation signals for downstream tasks. Experiments across general and domain-specific settings show that \textsc{ReGrad} outperforms CPT and RAG baselines, enabling scalable and reversible parametric knowledge injection without accumulating weight drift.

05.
arXiv (CS.LG) 2026-06-17

Toward Controllable Catalyst Inverse Design via Large-Scale Autoregressive Pretraining

arXiv:2606.17445v1 Announce Type: new Abstract: Inverse design of heterogeneous catalysts remains challenging because catalyst surfaces exhibit substantial structural complexity with coupled surface-adsorbate interactions across a vast chemical space that is difficult to explore efficiently through conventional screening alone. Although machine learning-based high-throughput screening has accelerated catalyst discovery, its efficiency inevitably declines as the search space grows, motivating the development of generative models that can directly construct catalysts with target properties. Here, we present a conditional catalyst generative model based on the Generative Pretrained Transformer architecture with a numerical embedding layer that enables the generation of catalyst structures conditioned on both categorical and continuous properties within a single autoregressive framework. The model was pretrained on 133 million catalyst structures and subsequently fine-tuned on approximately 460,000 optimized structures with associated categorical properties and binding energies for conditional generation. The resulting model achieved 98% structural validity, 95% optimization validity, and high categorical condition fidelity, with a 93 % joint match rate for adsorbate type and composition. For binding energy conditioning, the match rate of approximately 20% represents a four-fold improvement over the baseline training distribution, and the generated distributions shift systematically toward the target values, enabling a 1.5 to 4-fold improvement in screening efficiency for reaction-targeted catalyst discovery without additional fine-tuning. These results show that large-scale autoregressive pre-training, combined with explicit property conditioning, provides a practical route toward controllable catalyst generation and accelerated catalysts discovery.

06.
arXiv (CS.CL) 2026-06-16

Detecting Hate and Inflammatory Content in Bengali Memes: A New Multimodal Dataset and Co-Attention Framework

Internet memes have become a dominant form of expression on social media, including within the Bengali speaking community. While often humorous, memes can also be exploited to spread offensive, harmful, and inflammatory content targeting individuals and groups. Detecting this type of content is exceptionally challenging due to its satirical, subtle, and culturally specific nature. This problem is magnified for low-resource languages like Bengali, as existing research predominantly focuses on high-resource languages. To address this critical research gap, we introduce Bn-HIB (Bangla Hate Inflammatory Benign), a novel dataset containing 3,247 manually annotated Bengali memes categorized as Benign, Hate, or Inflammatory. Significantly, Bn- HIB is the first dataset to distinguish inflammatory content from direct hate speech in Bengali memes. Furthermore, we propose the MCFM (Multi-Modal Co-Attention Fusion Model), a simple yet effective architecture that mutually analyses both the visual and textual elements of a meme. MCFM employs a co-attention mechanism to identify and fuse the most critical features from each modality, leading to a more accurate classification. Our experiments show that MCFM significantly outperforms several state-of-the-art models on the Bn-HIB dataset, demonstrating its effectiveness in this nuanced task. To facilitate reproducibility and future research, the Bn-HIB dataset has been made publicly available through Mendeley Data. Warning: This work contains material that may be disturbing to some audience members. Viewer discretion is advised

07.
arXiv (quant-ph) 2026-06-11

Diffusive Relaxation of Participation Entropy in U(1)-symmetric Dynamics

arXiv:2606.11561v1 Announce Type: new Abstract: Participation entropy (PE) quantifies the spread of a many-body wavefunction across configuration space. While PE relaxes rapidly in generic chaotic systems, we show that $\mathrm{U}(1)$ conservation laws slow it down by imprinting with the slow hydrodynamic modes. Using a cluster expansion around equilibrium, we show that, after local density inhomogeneities decay, the leading PE deficit is dominated by squared connected density correlations. The long time relaxation is therefore controlled by diffusive correlation spreading, giving $\Delta S(t)\sim t^{-1/2}$ in the hydrodynamic regime and crossing over to $\sim \exp[-O(t/L^2)]$ when $t\geq L^2$. We confirm this entropy correlation relation using exact computation and infinite system tensor network simulations in various quantum $\mathrm{U}(1)$ conserving circuits. Our results establish PE as a sensitive probe of hydrodynamic memory and suggest that slow relaxation is a generic consequence of conservation laws.

08.
arXiv (CS.CV) 2026-06-12

Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

Text-guided image editing with visual autoregressive (VAR) generators requires controlling both what the model samples and where the sampled change is written back into the image code. Existing VAR editors mainly operate on token streams, features, or flat next-token logits, leaving two native structures of bitwise-residual VAR models underused: the per-bit Bernoulli prediction head and the additive multi-scale residual code field from which the image is assembled. We propose BitResEdit, a training-free editor for bitwise-residual VAR generators such as Infinity. BitEdit performs source-negative guidance by tilting the post-CFG per-bit log-odds along a source–target contrast computed on a shared edited prefix, then projects each update into a closed-form Bernoulli-KL trust region around the clean CFG sampler. ResEdit converts the sampled bits into per-scale continuous-code residuals, gates them with a localization mask, and re-injects them through the generator's native sum-of-scales. Together they couple decision-time bit guidance with combination-time code composition, so masked-out latent features are preserved exactly by code arithmetic while localized, scale-aware edits are applied inside the target region. On PIE-Bench with Infinity-2B, BitResEdit attains the strongest text alignment among same-backbone VAR editors, improving CLIP on the edited region by +1.07 over the strongest prior editor while keeping background preservation competitive with it. Ablations show BitEdit and ResEdit play complementary roles in target alignment and background preservation.

09.
arXiv (CS.AI) 2026-06-15

Learning Urban Access Costs from Origin-Destination Flows via Inverse Optimal Transport

arXiv:2606.14157v1 Announce Type: cross Abstract: Cities deliver basic services through mixed public-private facility networks, including schools, clinics, transit providers, and subsidized service points. In these systems, planners often observe where households go, but not the latent cost function through which they trade off factors such as distance, price, and institutional access. We study this urban problem through school choice in the Philippines, where the country's largest national education subsidy is intended to redirect learners from congested public schools to participating private schools. Treating school-to-school enrollment flows as an entropic optimal transport plan, we recover latent choice costs using two complementary inverse optimal transport models: an interpretable distance-banded model with a subsidy term, and a neural cost model trained through a differentiable Sinkhorn forward pass. Applied to 283{,}016 learner trips across 23{,}820 observed flows in the most populated region, the framework estimates a subsidy-equivalent distance, $\lambda^{(k)}$, interpreted as the kilometers of perceived travel cost offset by the subsidy. The case demonstrates how administrative origin-destination data can be transformed into interpretable planning metrics for accessibility-aware subsidy design, facility siting, and urban service allocation.

10.
arXiv (CS.CV) 2026-06-16

Cross-Modal Registration Between 3D and 2D Fingerprints via Pose-Aware Unwrapping and Point-Cloud Fusion

Three-dimensional (3D) fingerprints preserve global finger geometry and local ridge structure while avoiding contact-induced deformation, but they remain difficult to integrate with legacy two-dimensional (2D) fingerprint systems. This paper addresses the intermediate stage between 3D acquisition and cross-modal matching, and presents a unified framework for 3D fingerprint preprocessing and registration across contactless and contact-based 2D modalities. The framework combines four components: 1) a nonparametric visualization and unwrapping method that converts a 3D fingerprint point cloud into a rolled-equivalent 2D representation without relying on a global finger-shape model; 2) a point-cloud fusion pipeline that registers and mosaics multiple partial 3D captures into a more complete fingerprint model; 3) an ellipse-based pose normalization method for canonical finger alignment; and 4) a pose-aware cross-modal registration strategy that improves compatibility between 3D fingerprints and both contactless and contact-based 2D fingerprints. Experiments on a self-collected multimodal fingerprint database containing 150 fingers show that the proposed framework achieves ridge-level 3D registration accuracy, robust pose estimation, and consistent gains in 2D compatibility. In particular, the 3D fusion error is concentrated around 0.09 mm, contactless 2D–3D registration reaches ridge-scale projection accuracy, and pose-aware unwrapping improves genuine matching scores relative to generic 3D unwrapping. These results support the use of 3D fingerprints as an effective geometric bridge across heterogeneous fingerprint modalities. The baseline implementation has been publicly released at https://github.com/XiongjunGuan/3DFpVisual.

11.
arXiv (CS.AI) 2026-06-16

Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning

arXiv:2605.06734v2 Announce Type: replace-cross Abstract: Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-qubit architectures that are difficult to scale on noisy intermediate-scale quantum (NISQ) devices and expensive to simulate classically. We propose gated QKAN-FWP, a fast-weight framework that integrates FWP with Quantum-inspired Kolmogorov-Arnold Network (QKAN) using single-qubit data re-uploading circuits as learnable nonlinear activation, known as DatA Re-Uploading ActivatioN (DARUAN). We further introduce a scalar-gated fast-weight update rule that stabilizes parameter evolution, supported by a theoretical analysis of its adaptive memory kernel, geometric boundedness, and parallelizable gradient paths. We evaluate the framework across time-series benchmarks, MiniGrid reinforcement learning, and highlight real-world solar cycle forecasting as our main practical result. In the long-horizon setting with 528-month input window and 132-month forecast horizon, our 12.5k-parameter model achieves lower scaled Mean Square Error (MSE), peak amplitude error, and peak timing error than a suite of classical recurrent baselines with up to 13x more parameters, including Long Short-Term Memory (LSTM) networks (25.9k-89.1k parameters), WaveNet-LSTM (167k), Vanilla recurrent neural network (11.5k), and a Modified Echo State Network (132k). To validate NISQ compatibility, we further deploy the trained fast programmer on IonQ and IBM Quantum processors, recovering forecasting accuracy within 0.1% relative MSE of the noiseless simulator at 1024 shots. These results position gated QKAN-FWP as a scalable, parameter-efficient, and NISQ-compatible approach to quantum-inspired sequence modeling.

12.
arXiv (CS.AI) 2026-06-16

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

arXiv:2605.02427v3 Announce Type: replace Abstract: A recurring pattern in "reasoning without training" is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at inference time. Power sampling provides a principled way to bias decoding toward such modes by targeting p_theta(x)^alpha with alpha > 1, but practical approximations must account for future-dependent correction factors that determine which prefixes remain promising. We introduce Auxiliary Particle Power Sampling (APPS), a blockwise particle algorithm for approximating the sequence-level power target with a bounded population of partial solutions. APPS propagates hypotheses in parallel using proposal-corrected power reweighting and refines their survival through future-value-guided selection at resampling boundaries. This redistributes finite compute across competing prefixes rather than committing to a single unfolding path, while providing a direct scaling knob in the particle count and predictable peak memory. We instantiate the future-value signal with short-horizon rollouts and also study an amortized variant that replaces rollouts with a lightweight learned selection head. AMore broadly, APPS improves the accuracy–runtime trade-off of training-free decoding, further supporting the view that inference-time power approximation can recover gains often attributed to post-training.

13.
arXiv (CS.CV) 2026-06-16

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

作者:

Video anomaly detection in surveillance settings must balance detection accuracy against real-time throughput, a tension that existing methods address either through stronger feature extractors or more efficient architectures, but rarely both. We present VigilFormer, a unified framework that combines deformable spatio-temporal attention with causal temporal modeling to detect anomalies in untrimmed surveillance video. The proposed Deformable Spatio-Temporal Encoder (DSTE) attends to a sparse set of informative locations across frames, avoiding the quadratic cost of dense attention while retaining the ability to capture irregular motion patterns. A Causal Anomaly Classifier (CAC) applies dilated causal convolutions over snippet-level features and optimizes a contrastive multiple-instance learning objective that separates anomalous and normal representations without frame-level labels. To meet deployment constraints, an Adaptive Confidence Scheduler (ACS) dynamically skips low-information frames at inference time, reducing redundant computation in static scenes. Evaluated on UCF-Crime, ShanghaiTech, and CUHK Avenue, VigilFormer achieves AUC scores of 87.83%, 97.21%, and 89.74% respectively, at 41.5 FPS on a single GPU, outperforming recent weakly-supervised methods in both accuracy and speed.

14.
arXiv (CS.AI) 2026-06-16

Theorem-Grounded Execution Ontologies for Interpretable Machine Reasoning

arXiv:2606.16010v1 Announce Type: cross Abstract: Large language models have achieved impressive performance on reasoning tasks spanning mathematics, science, programming, and commonsense inference. Despite these advances, their reasoning processes remain largely latent, making them difficult to interpret, verify, replay, debug, and transfer across domains. Existing approaches such as chain-of-thought, tree-of-thoughts, graph-of-thoughts, and tool-augmented reasoning expose intermediate reasoning artifacts but typically lack explicit execution semantics, formal state representations, and verifiable reasoning structures. We introduce Theorem-Grounded Execution Ontologies (TGEO), a framework that models reasoning as an executable state-transition process rather than a sequence of generated tokens. Given an input problem, TGEO identifies relevant theorem families, binds the problem to a domain ontology, discovers semantic objects, instantiates states and operators, constructs predicates and contracts, and synthesizes an executable reasoning graph. The resulting graph provides an interpretable, replayable, and auditable representation of reasoning in which every state transition, operator application, and validation step is explicitly represented. TGEO integrates five architectural components: (1) theorem-grounded reasoning priors, (2) executable ontologies, (3) operator-mediated state transitions, (4) predicate and contract-based execution validation, and (5) architectural auditing and failure localization. We evaluate TGEO on theorem-intensive reasoning tasks derived from mathematical benchmark domains and a curated Golden Execution Suite. Our findings demonstrate the value of executable reasoning representations for interpretable, verifiable, and reproducible AI reasoning systems.

15.
arXiv (math.PR) 2026-06-17

Full $\Gamma-$expansion for the level-two large deviation rate functionals of non-reversible one-dimensional diffusions with periodic boundary conditions

arXiv:2606.17859v1 Announce Type: new Abstract: Consider the diffusion process \begin{equation*} dX_{\epsilon}(t) = \mss b(X_{\epsilon}(t)) \, dt + \sqrt{2\, \epsilon\, \mss a(X_\epsilon(t))} \, dW_{t}, \end{equation*} on the one-dimensional torus $\bb T = [0,1)$. Here $\epsilon$ is the temperature, $W_{t}$ a Brownian motion on $\bb T$ and $\mss a$, $\mss b$ functions of class $C^{2}(\bb T)$ satisfying further conditions. Denote by $\mss P(\bb T)$ the set of probability measures on $\bb T$ equipped with the weak topology, and by $\ms I_{\epsilon}\colon \mss P(\bb T)\to [0,+\infty)$ the level two large deviation rate functional of the diffusion $X_{\epsilon}(\cdot)$. We derive a full $\Gamma-$expansion of $\ms I_{\epsilon}$, as $\epsilon \to 0$, expressing it as \begin{equation*} \ms I_{\epsilon} = \frac{1}{\epsilon} \;\ms J^{(-1)} \; +\; \ms J^{(0)} \;+\; \sum_{p=1}^{\widehat{\mf q}}\frac{1}{\theta^{(p)}_{\epsilon}}\;\ms J^{(p)}\,, \end{equation*} where $\ms J^{(-1)}$, $\ms J^{(0)}$, $\ms J^{(p)} \colon \mss P(\bb T)\to [0,+\infty]$ represent rate functionals, independent of $\epsilon$, and $\theta^{(p)}_{\epsilon}$ are the time-scales at which the Markov process $X_{\epsilon}(\cdot)$ exhibits a metastable behaviour.

16.
arXiv (CS.LG) 2026-06-17

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

arXiv:2604.17616v3 Announce Type: replace Abstract: Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

17.
arXiv (CS.LG) 2026-06-11

On Regret Bounds of Thompson Sampling for Bayesian Optimization

arXiv:2603.09276v2 Announce Type: replace-cross Abstract: We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) with established high-probability and expected regret bounds, most analyses of GP-TS have been limited to expected regret. Moreover, whether the recent analyses of GP-UCB for the lenient regret and the improved cumulative regret upper bound can be applied to GP-TS remains unclear. To fill these gaps, this paper shows several regret bounds: (i) a regret lower bound for GP-TS, which implies that GP-TS suffers from a polynomial dependence on $1/\delta$ with probability $\delta$, (ii) an upper bound of the second moment of cumulative regret, which directly suggests an improved regret upper bound on $\delta$, (iii) expected lenient regret upper bounds, and (iv) an improved cumulative regret upper bound on the time horizon $T$. Along the way, we provide several useful lemmas, including a relaxation of the necessary condition from recent analysis to obtain improved regret upper bounds on $T$.

18.
arXiv (quant-ph) 2026-06-16

Towards Interpretability of Neural Quantum States

arXiv:2508.14152v2 Announce Type: replace Abstract: Neural quantum states (NQS) have emerged as a powerful variational ansatz for representing quantum many-body wave functions. Their internal mechanisms, however, remain poorly understood. We investigate the role of correlations for NQS-like quantum state representation by employing a correlation-based interpretable neural network architecture and then proving our observations using Boolean function theory. The correlator neural network demonstrates that, even for simple product states, up to all system-size correlation orders in the chosen computational basis are required to represent a quantum state faithfully. We explain these observations using Fourier expansion, which reveals the correlator basis as the effective basis of the internal NQS structure, the resulting necessity for high-order correlations that is supported by an entanglement bound that scales with the correlation order, consequences of linear dependencies in constrained Hilbert spaces for correlation requirements, and connections between spin basis rotations and the correlator basis. Furthermore, we analyze how neural networks achieve high correlation orders by increasing the magnitude of the network weights, which can be compensated by increasing the network depth. Lastly, we discuss how activation functions, network architectures, and choice of reference basis influence correlation requirements. Our results provide new insights and a better understanding of the internal structure and requirements of NQS, enabling a more systematic use of NQS in future research.

19.
arXiv (CS.CV) 2026-06-11

CoVR-R:Reason-Aware Composed Video Retrieval

Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification text fully specifies the visual changes, overlooking after-effects and implicit consequences (e.g., motion, state transitions, viewpoint or duration cues) that emerge from the edit. We argue that successful CoVR requires reasoning about these after-effects. We introduce a reasoning-first, zero-shot approach that leverages large multimodal models to (i) infer causal and temporal consequences implied by the edit, and (ii) align the resulting reasoned queries to candidate videos without task-specific finetuning. To evaluate reasoning in CoVR, we also propose CoVR-Reason, a benchmark that pairs each (reference, edit, target) triplet with structured internal reasoning traces and challenging distractors that require predicting after-effects rather than keyword matching. Experiments show that our zero-shot method outperforms strong retrieval baselines on recall at K and particularly excels on implicit-effect subsets. Our automatic and human analysis confirm higher step consistency and effect factuality in our retrieved results. Our findings show that incorporating reasoning into general-purpose multimodal models enables effective CoVR by explicitly accounting for causal and temporal after-effects. This reduces dependence on task-specific supervision, improves generalization to challenging implicit-effect cases, and enhances interpretability of retrieval outcomes. These results point toward a scalable and principled framework for explainable video search. The model, code, and benchmark are available at https://github.com/mbzuai-oryx/CoVR-R.

20.
arXiv (CS.LG) 2026-06-16

Data-driven Control with Real-time Uncertainty Compensation for Multi-Fuel Engines

arXiv:2606.16171v1 Announce Type: cross Abstract: Multi-fuel compression ignition (CI) engines offer superior power density and fuel flexibility. However, achieving consistent and optimal combustion phasing across a wide range of operating conditions remains a major challenge, particularly in the presence of modeling uncertainties. This paper presents a novel, data-driven real-time uncertainty compensation framework for combustion control in multi-fuel CI engines. The proposed approach introduces a pseudo-engine speed that enables dynamic adaptation of control inputs in response to uncertainty affecting the engine. To model the underlying combustion process, a Gaussian Process Regression (GPR) model is first trained on available input-output data, capturing the nonlinear and fuel-dependent behavior across varying operating conditions. Control inputs are then synthesized through model inversion of the learned GPR surrogate and augmented with an uncertainty compensator designed to mitigate deviations caused by dynamic variations in operating conditions and model inaccuracies. This integrated control strategy allows for real-time input corrections within a finite number of combustion cycles. Theoretical analysis establishes finite-time convergence guarantees for the proposed controller. Simulation results demonstrate that the proposed method steers the combustion phasing to the desired value in real-time, providing a scalable and adaptive control solution for multi-fuel CI engine operation.

21.
arXiv (CS.LG) 2026-06-16

Graph Learning Should Move Beyond Restrictive Views of Spectral and Message-Passing GNNs

arXiv:2602.10031v2 Announce Type: replace Abstract: Graph neural networks (GNNs) are commonly divided into message-passing neural networks (MPNNs) and spectral GNNs, reflecting two largely separate research traditions in machine learning and signal processing. While MPNNs have a precise definition, there is no widely accepted criterion for what makes a mapping a spectral GNN. Most existing work restricts spectral GNNs to layered architectures based on linear spectral filters. Under this restriction, we show that spectral and spatial GNNs have largely equivalent expressive power. To promote progress in the field, we propose a precise definition of spectral GNNs based on eigenbasis symmetries, in contrast to the definition of MPNNs via neighborhood permutation symmetries. We further argue that the two perspectives offer complementary strengths. MPNNs provide a natural language for discrete structure and expressivity analysis through tools from logic and graph isomorphism, while the spectral perspective offers principled tools for understanding smoothing, bottlenecks, stability, and community structure. Overall, we argue that progress in graph learning will be accelerated by clarifying the similarities and differences between these perspectives and by moving toward a unified theoretical framework.

22.
arXiv (CS.AI) 2026-06-17

MoCo-AIS: A Contrastive Learning Framework for Similarity Computation of Vessel Trajectories

arXiv:2606.17978v1 Announce Type: new Abstract: Trajectory similarity is a fundamental task in analyzing mobility patterns, essential for applications such as route pattern extraction, mobility prediction, and anomaly detection. Traditional distance-based measures for computing similarity incur high computational cost, driving the adoption of lightweight learning-based approaches. Supervised methods rely on extensive labels derived from traditional distance measures and often reproduce these metrics, which limits generalization. While self-supervised learning addresses this issue through contrastive learning, it lacks a unified framework, making it difficult to compare deep learning (DL) models for consistent trajectory representation. Accordingly, this paper presents MoCo-AIS, a unified framework for learning vessel trajectory embeddings based on the Momentum Contrast (MoCo) paradigm, which formulates similarity learning through positive and negative trajectory pairs. Within this framework, we evaluate a diverse set of leading DL models on large-scale, real-world vessel-tracking AIS datasets that capture diverse navigation behaviors and operating conditions. Results demonstrate that our framework significantly improves similarity learning over existing baselines, while providing a benchmarking platform for evaluating trajectory representation models.

23.
arXiv (CS.CL) 2026-06-12

LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

Sentence representations are foundational to many Natural Language Processing (NLP) applications. While recent methods leverage Large Language Models (LLMs) to derive sentence representations, most rely on final-layer hidden states, which are optimized for next-token prediction and thus often fail to capture global, sentence-level semantics. This paper introduces a novel perspective, demonstrating that attention value vectors capture sentence semantics more effectively than hidden states. We propose Value Aggregation (VA), a simple method that pools token values across multiple layers and token indices. In a training-free setting, VA outperforms other LLM-based embeddings, even matches or surpasses the ensemble-based MetaEOL. Furthermore, we demonstrate that when paired with suitable prompts, the layer attention outputs can be interpreted as aligned weighted value vectors. Specifically, the attention scores of the last token function as the weights, while the output projection matrix ($W_O$) aligns these weighted value vectors with the common space of the LLM residual stream. This refined method, termed Aligned Weighted VA (AlignedWVA), achieves state-of-the-art performance among training-free LLM-based embeddings, outperforming the high-cost MetaEOL by a substantial margin. Finally, we highlight the potential of obtaining strong LLM embedding models through fine-tuning Value Aggregation.

24.
arXiv (CS.CV) 2026-06-16

BadWorld: Adversarial Attacks on World Models

Visual world models (VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are to adversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. First, to bypass the need for future supervision, we propose a self-supervised velocity attack that directly disrupts the early denoising dynamics of the model. Second, to ensure the attack generalizes across unpredictable user actions, we formulate a trajectory-adaptive bi-level optimization that actively mines hard control sequences to forge control-agnostic perturbations. Evaluated on representative VWMs with continuous and discrete controls, BadWorld exposes severe structural fragility. Visually indistinguishable adversarial images reliably trigger catastrophic degradation in future rollouts, leading to incomplete denoising, structural collapse, and control inconsistency. These findings reveal critical risks for deploying VWMs in safety-critical systems while highlighting a practical mechanism for privacy protection.

25.
medRxiv (Medicine) 2026-06-15

Population-scale genomics reveals divergent pathogenicity of variant classes across paralogous collagen IV genes

Monoallelic pathogenic or likely pathogenic variants in COL4A3 and COL4A4 occur in approximately 1 in 106 individuals, yet whether these paralogous genes confer equivalent pathogenicity for the same variant classes has not been tested at population scale. Using whole-genome sequencing data from the UK Biobank (UKB; n = 500,000), with replication in the All of Us Research Program (n = 414,000), we performed per-variant association testing, gene-based collapsing analyses and phenome-wide association studies (PheWAS) across haematuria, proteinuria and chronic kidney disease. We identified 64 COL4A3 and 92 COL4A4 rare variants significantly associated with haematuria or proteinuria, generating a quantitative allelic series for clinical variant interpretation. Glycine substitutions within collagenous domains conferred similar risks in both genes. In contrast, truncating and non-collagenous domain (NC1) missense variants were strongly associated with haematuria and proteinuria in COL4A4 carriers but showed substantially attenuated or absent associations in COL4A3 carriers despite comparable carrier frequencies and predicted pathogenicity scores. These findings were independently replicated in All of Us. Genome-wide association analysis identified the COL4A3/COL4A4 locus as the dominant genetic determinant of haematuria, with the signal attributable to the aggregate effects of rare coding variants and no evidence of independent common variant or trans-acting modifier effects. These findings demonstrate substantial gene-specific differences in tolerance to truncating and NC1 variants between COL4A3 and COL4A4, challenging assumptions of equivalent pathogenicity across paralogous collagen IV genes. Gene identity and not variant class alone, should inform risk stratification, variant interpretation and genetic counselling in individuals carrying collagen IV risk genotypes.