Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-17

Quantum-inspired Ising machine using sparsified spin connectivity

arXiv:2604.04606v2 Announce Type: replace-cross Abstract: Combinatorial optimization problems become computationally intractable as these NP-hard problems scale. We previously proposed extraction-type majority voting logic (E-MVL), a quantum-inspired algorithm using digital logic circuits. E-MVL mimics the thermal spin dynamics of simulated annealing (SA) through controlled sparsification of spin interactions for efficient ground-state search. This study investigates the performance potential of E-MVL through systematic optimization and comprehensive benchmarking against SA. The target problem is the Sherrington-Kirkpatrick (SK) model with bimodal and Gaussian coupling distributions. Through equilibrium state analysis, we demonstrate that the sparsity control mechanism provides a consistent search of the solution space regardless of the problem's coupling distribution (bimodal, Gaussian) or size. E-MVL not only achieves the best performance among all tested algorithms–solving exact solutions up to 1600 spins where the best SA baseline is limited to 400 spins–but also provides insights that significantly improve SA's own temperature scheduling. These results establish E-MVL's dual contribution as both an efficient optimizer and a practical methodology for enhancing SA performance. Moreover, FPGA implementation achieved an approximately 6-fold faster solution speed than SA.

02.
arXiv (CS.AI) 2026-06-15

TRACE: Trajectory-Routed Causal Memory for Delayed-Evidence Visuomotor Imitation

arXiv:2606.14551v1 Announce Type: cross Abstract: Robots under autonomous operation may require decisions based on evidence that is no longer visible. We study delayed-evidence tasks, where an early cue disappears before a later decision point, so visually similar observations can require different actions. In these settings, the current observation is not a sufficient state for control. We introduce TRAjectory-routed Causal Evidence (TRACE), a memory framework for visuomotor imitation policies. TRACE stores task-relevant visual and robot-state evidence, such as object identity, target choice, or route-dependent state, in a fixed-size latent memory that remains bounded over long episodes. Instead of indexing memory by raw time or manually provided task labels, TRACE uses path signatures: compact, order-sensitive features of the executed robot-state trajectory. These signatures do not store the visual cue itself; rather, they provide trajectory-conditioned keys for writing and retrieving the evidence stored when the cue was visible. When the robot later reaches an ambiguous observation, the policy conditions on TRACE memory to recover the missing context and choose the correct branch. TRACE attaches through lightweight adapters to policies, without changing the policy backbone, action head, or imitation objective. Across real-world long-horizon manipulation tasks with visually ambiguous branch points, TRACE improves branch selection and task success over alternative baselines, including short-history and recurrent memory. Project page: https://jeong-zju.github.io/trace

04.
arXiv (CS.CV) 2026-06-18

Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift

Artificial intelligence provides a practical framework for crop damage assessment from imagery data, supporting early decision-making in agricultural management. In peach orchards, climate change increases abiotic stress and biotic pressures, including pests and diseases, which often produce visually similar foliar symptoms. This overlap makes manual diagnosis difficult, especially across multiple fields with varying environmental conditions, highlighting the need for automated models with strong generalization ability. We propose an image-based classification approach for peach leaf damage detection. A benchmark dataset was created through manual annotation of publicly available images, consisting of 1,366 peach leaves across six damage categories. Several deep learning architectures were evaluated. EfficientNet models achieved the best results, with EfficientNetB0 reaching 92.9 percent accuracy, EfficientNetB3 achieving 91.5 percent, and EfficientNetB5 showing the strongest performance on minority classes. DenseNet121 reached 92.6 percent accuracy. The integration of the Convolutional Block Attention Module (CBAM) improved performance in several backbones, particularly EfficientNetB5 and InceptionV3, while showing limited or negative impact in others. The CBAM-enhanced EfficientNetB5 achieved the best overall accuracy of 93.3 percent. To evaluate robustness under realistic conditions, a local dataset of 180 images across four classes was collected, and transfer learning strategies were applied to address domain shift. Three fine-tuning strategies were tested. EfficientNetB3 combined with CBAM achieved the best performance in the local domain, reaching a 93 percent macro F1-score after transfer. Overall, attention-based models showed improved robustness for minority classes and better generalization across different field conditions.

05.
arXiv (CS.LG) 2026-06-16

Machine Learning and the Random Walk Puzzle: Forecasting the CAD/USD Exchange Rate with Expanding Window Evaluation and SHAP Interpretability

arXiv:2606.15058v1 Announce Type: new Abstract: This study examines whether machine learning (ML) models can outperform the naive random walk benchmark in forecasting the monthly USD/CAD exchange rate. Using daily data from the Bank of Canada spanning January 2017 to May 2026, resampled into 113 monthly observations, five ML models are evaluated: linear regression, random forest, gradient boosting, XGBoost, and AdaBoost. These models are benchmarked against the naive random walk model and exponential smoothing with Holt-Winters seasonality (ETS). All models are evaluated using an expanding-window framework to maintain strict out-of-sample integrity, and forecast-accuracy differences are assessed using the Diebold-Mariano (DM) test. Structural break detection identifies four significant breakpoints in the series, corresponding to the escalation of the US-China trade war in 2018, the COVID-19 economic recovery in 2020, the peak of the Bank of Canada rate-hiking cycle in 2022, and the start of the Bank of Canada rate-cutting cycle in 2024. SHAP, or Shapley Additive Explanations, analysis is applied to interpret the drivers of the best-performing ML model. The results show that the naive random walk model remains a formidable benchmark. Linear regression is the only model that statistically outperforms the naive random walk model, with a DM statistic of 3.0585 and a p value of 0.0071, whereas the ML ensemble models show only marginal differences. Random Forest with an expanding-window framework achieves the lowest MAPE of 1.17 percent among all models except the random walk. SHAP analysis confirms that short-term lags, particularly lag1 and lag2, and recent rolling means dominate predictions, consistent with the near-random-walk behavior of exchange rates.

06.
bioRxiv (Bioinfo) 2026-06-11

DyMoTree decodes early cell state transitions and drivers from single-cell transcriptomes using a tree-structured neural network

Inferring early cell fate from single-cell RNA-sequencing data is essential for identifying cellular origins and fate plasticity in development and disease. However, existing methods often fail to exploit tree-structured lineage trajectories, limiting the accuracy and interpretability of fate mapping. Here we present DyMoTree, a computational framework that models cell fate decisions as nonlinear mappings between progenitor and terminal cell states under explicit lineage constraints. By integrating lineage graphs with a tree-structured neural architecture, DyMoTree learns lineage-resolved cell-state transition maps from single-cell transcriptomes, enabling robust inference of early fate bias and identification of fate-specific progenitor substates and driver genes. Across simulations, lineage-tracing experiments, and in vivo systems, DyMoTree outperformed existing methods in resolving early fate biases. Applications to mouse embryogenesis, lung adenocarcinoma progression, and CAR-T immunotherapy revealed regulatory programs underlying developmental and disease-associated transitions. DyMoTree provides a general framework for modeling lineage-resolved cell-state dynamics underlying development and disease progression.

07.
arXiv (CS.CV) 2026-06-17

NTIRE 2025 Challenge on Image Super-Resolution (x4): Methods and Results

This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that achieve state-of-the-art SR performance. To reflect the dual objectives of image SR research, the challenge includes two sub-tracks: (1) a restoration track, emphasizes pixel-wise accuracy and ranks submissions based on PSNR; (2) a perceptual track, focuses on visual realism and ranks results by a perceptual score. A total of 286 participants registered for the competition, with 25 teams submitting valid entries. This report summarizes the challenge design, datasets, evaluation protocol, the main results, and methods of each team. The challenge serves as a benchmark to advance the state of the art and foster progress in image SR.

08.
arXiv (CS.CV) 2026-06-17

Root-Selecting Fixed-Point Inversion for Rectified Flows via Trajectory Straightness

Finding the initial noise that generates a given data sample, known as inversion, is a key component for downstream applications such as training-free image editing. Existing fixed-point inversion methods improve inversion accuracy by formulating each inversion step as a fixed-point problem, but they lack a principled mechanism for selecting among multiple fixed-point solutions that can arise in practice. We observe that different selections induce different inversion trajectories, leading to substantial variation in reconstruction and editing quality. For rectified flows, we further find that this variation is closely associated with trajectory straightness, motivating straightness as a principled selection criterion. We propose SelFix, a fixed-point inversion method that selects fixed-point solutions inducing straighter inverse trajectories while retaining convergence to an exact inverse root under standard local assumptions. Experiments on FLUX.1-dev and PIE-Bench show that SelFix improves fixed-point inversion, achieving stronger real-image reconstruction and better source-preserving prompt-based editing than prior inversion baselines. The code is available at https://github.com/seminkim/selfix.

09.
arXiv (CS.CL) 2026-06-15

Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links

While moral reasoning has emerged as a promising research direction for large language models (LLMs), achieving robust generalization remains a critical challenge. This challenge arises from the gap between what is said and what is morally implied. In this paper, we build on metapragmatic links and Moral Foundations Theory to close this gap. Specifically, we develop a pragmatic inference approach that enables LLMs, given a moral situation, to acquire the metapragmatic links between moral reasoning objectives and the social variables that influence them. We adapt this approach to three different moral reasoning tasks to demonstrate its adaptability and generalizability. Experimental results show that our approach significantly enhances LLMs' generalization in moral reasoning, paving the way for future research to leverage pragmatic inference across a wide range of moral reasoning tasks.

10.
arXiv (CS.CL) 2026-06-16

SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG

Fixed-length chunking in Retrieval-Augmented Generation (RAG) often leads to boundary fragmentation, where critical evidence is split across segments, degrading retrieval recall. While static windowing and parent retrieval improve recall, they introduce significant token overhead. We propose SCAR (Semantic Continuity-Aware Retrieval), an adaptive retrieval policy that selectively expands neighboring chunks by weighing query-neighbor relevance against a structural continuity penalty. SCAR uses a relative expansion threshold tied to each retrieved chunk's own query-relevance, yielding an approximately scale-invariant decision rule that transfers across embedding models without recalibration. Across four diverse corpora (RFC, GDPR, a 10-K report, and a Merger agreement; N=320 queries; 160 boundary-fragmented), SCAR achieves 92.8% recall on boundary-fragmented queries with only 7.84 chunks, a 22.9% reduction compared to static windowing (10.16 chunks). Paired bootstrap tests (B=10,000) confirm the chunk reduction is highly significant (p

11.
arXiv (CS.AI) 2026-06-19

Neural Additive and Basis Models with Feature Selection and Interactions

arXiv:2606.19850v1 Announce Type: cross Abstract: Deep neural networks (DNNs) exhibit attractive performance in various fields but often suffer from low interpretability. The neural additive model (NAM) and its variant called the neural basis model (NBM) use neural networks (NNs) as nonlinear shape functions in generalized additive models (GAMs). Both models are highly interpretable and exhibit good performance and flexibility for NN training. NAM and NBM can provide and visualize the contribution of each feature to the prediction owing to GAM-based architectures. However, when using two-input NNs to consider feature interactions or when applying them to high-dimensional datasets, training NAM and NBM becomes intractable due to the increase in the computational resources required. This paper proposes incorporating the feature selection mechanism into NAM and NBM to resolve computational bottlenecks. We introduce the feature selection layer in both models and update the selection weights during training. Our method is simple and can reduce computational costs and model sizes compared to vanilla NAM and NBM. In addition, it enables us to use two-input NNs even in high-dimensional datasets and capture feature interactions. We demonstrate that the proposed models are computationally efficient compared to vanilla NAM and NBM, and they exhibit better or comparable performance with state-of-the-art GAMs.

12.
arXiv (quant-ph) 2026-06-15

Nonadiabatic Self-Healing of Trotter Errors in Digitized Counterdiabatic Dynamics

arXiv:2512.22636v2 Announce Type: replace Abstract: Trotter errors in digitized quantum dynamics arise from approximating time-ordered evolution under noncommuting Hamiltonian terms with a product formula. In the adiabatic regime, such errors are known to exhibit long-time self-healing [Phys. Rev. Lett. 131, 060602 (2023)], where discretization effects are effectively suppressed. Here we show that self-healing persists at finite evolution times once nonadiabatic errors induced by finite-speed ramps are compensated. Using counterdiabatic driving to cancel diabatic transitions and isolate discretization effects, we study both noninteracting and interacting spin models and characterize the finite-time scaling with the Trotter steps and the total evolution time. In the instantaneous eigenbasis of the driven Hamiltonian, the leading digital error maps to an effective harmonic perturbation whose dominant Fourier component yields an analytic upper bound on the finite-time Trotter error and reveals the phase-cancellation mechanism underlying self-healing. Our results establish finite-time self-healing as a generic feature of digitized counterdiabatic protocols, clarify its mechanism beyond the long-time adiabatic limit, and provide practical guidance for high-fidelity state preparation on gate-based quantum processors.

13.
arXiv (CS.AI) 2026-06-12

Modern analog computing for solving differential and matrix equations

arXiv:2606.13179v1 Announce Type: cross Abstract: In recent years, driven by the computational demands of data-intensive applications such as artificial intelligence and scientific computing, analog computing has gained renewed interest. Given the diversity of computational tasks and recent advancements in analog CMOS circuits and resistive memory technologies, we refer to the evolving landscape as modern analog computing. In this context, we identify three core computational primitives: solving differential equations, solving matrix equations, and performing matrix-vector multiplications, and we explore the connections among them. We also examine various hardware implementations of these analog computing operators, including those built with discrete components, integrated circuits, and resistive memory devices. Among these, resistive memory arrays emerge as particularly promising due to their implementation efficiency. The paper then surveys recent progress in leveraging modern analog computing to solve differential and matrix equations using both advanced analog CMOS circuits and resistive memory arrays. Finally, we discuss the applications of these circuits, the precision and scalability issues and their potential solutions, the relationship with in-memory computing, and the unique computational complexity of analog computing. This paper provides a unified perspective on analog computing, highlighting its strengths, current developments, and challenges, and positioning it as a pivotal enabler of next-generation computational frontiers.

14.
medRxiv (Medicine) 2026-06-22

Repeat expansions in Parkinson's disease and parkinsonism across ancestries: insights from a global genetic cohort

Expanded short tandem repeats contribute to a broad spectrum of neurodegenerative diseases, yet their roles in Parkinson's disease (PD) and parkinsonism remain incompletely characterized, especially across diverse ancestries. We analyzed short-read whole-genome (WGS) and clinical exome sequencing (CES) data from 38,365 individuals (28,861 WGS; 9,504 CES), encompassing 23,242 patients with PD, 4,729 patients with atypical parkinsonism and 10,394 healthy controls from 11 genetic ancestries. To determine carrier frequencies and characterize repeat structures across diverse ancestries, we genotyped 12 established pathogenic loci where normal, intermediate, and pathogenic alleles can be reliably differentiated using short-read sequencing data. Additionally, we conducted threshold-based associations to determine the minimum threshold associated with increased PD risk in 15,995 individuals (8,591 PD, 7,404 controls) of European ancestry. Pathogenic repeat expansions were detected in 62 patients (56 PD and 6 atypical parkinsonism) and 5 controls across seven loci (AR, ATXN1, ATXN2, ATXN3, CACNA1A, HTT and THAP11), spanning seven ancestries. Among these, ATXN2 expansions were the most frequently observed in PD and were present in African, East Asian, European and Middle Eastern ancestries. Additionally, intermediate ATXN2 repeat expansions exhibited a strong, length-dependent association with PD risk in the European population, with individuals with [≥]32 repeats having a more than four-fold increased risk (odds ratio 4.25, 95% confidence interval 1.80-12.05). Overall, >92% of expanded alleles harbor CAA interruptions within the CAG tract. Pathogenic expansions at other loci, such as ATXN3 and THAP11, showed more ancestry-specific distributions. Clinically, individuals with pathogenic ATXN2 and ATXN3 expansions most often presented with typical PD features but frequently showed earlier disease onset and a strong family history of PD. This large-scale, multi-ancestry study comprehensively maps the genetic landscape of pathogenic and intermediate repeat expansions in PD. Our findings confirm a length- and structure-dependent risk association for ATXN2 with PD in the European population, and highlight the pleiotropic effects of repeat expansions across the parkinsonian spectrum.

15.
medRxiv (Medicine) 2026-06-18

Avidity of anti-pertussis toxin antibodies is associated with symptomatic Bordetella pertussis infection in a novel controlled human infection model

Background The association between functional antibody responses following Bordetella pertussis infection and symptomatic disease remains unclear. We characterized the maturation of anti-pertussis toxin (PT) IgG avidity after human challenge with B. pertussis and determined its association with symptomatic infection. Methods Healthy adults were intranasally inoculated with live B. pertussis organisms in a controlled human infection model and monitored for development of pertussis symptoms (NCT05136599). Serum samples were collected one day before inoculation and at 14, 28, 56, 180, and 365 days post challenge. Anti PT IgG avidity was tested using a titration of ammonium isothiocyanate (the bond breaking agent) to quantify a wide range of antibody avidities from low to very-high. Associations between covariates and avidity were examined using linear regression models, and high dimensional analyses were used to integrate all data. Findings Anti PT IgG avidity increased in both symptomatic (n=20) and asymptomatic (n=10) participants after the challenge, reached maximum levels at day 56, and then declined through day 365. Symptomatic participants developed significantly higher levels of high- and very high-avidity anti-PT antibodies at 28, 56, 180, and 365 days post-challenge compared with those who remained asymptomatic. In multivariate analyses, symptomatic infection was associated with higher levels of high and very high avidity anti-PT IgG at day180 and365 after challenge. Distinct avidity profiles in symptomatic vs asymptomatic participants emerged at day28 onwards, with the former group having higher levels of antibodies with higher avidities. However, levels of medium-high, high and very high avidity antibodies in symptomatic participants were lower at day 365 after challenge compared to their peak levels. Interpretation Anti-PT IgG avidity was associated with symptomatic B. pertussis infection and thus may serve as a surrogate of clinical disease outcome. These results highlight that antibody avidity provides an additional functional assay besides antibody quantitation to dissect immune responses to pertussis. Further investigation of anti PT IgG avidity should be pursued in natural pertussis outbreaks to determine whether it might be used to differentiate symptomatic from asymptomatic infections for epidemiologic purposes.

16.
arXiv (CS.LG) 2026-06-15

MUFFLe: Efficient Model Update Compression via Generalized Deduplication for Federated Learning

arXiv:2606.14354v1 Announce Type: new Abstract: Federated learning is well suited to edge environments but is often limited by the uplink cost of transmitting model updates. This Work-in-Progress paper presents MUFFLe, a communication-efficient update compression scheme that integrates generalized deduplication (GD) into the FedAvg pipeline. MUFFLe deduplicates repeated patterns across the update vector, yielding a fixed-rate, variable-count compression scheme. Preliminary experiments on IID MNIST with 20 clients show that MUFFLe reaches the target accuracy of $92.93\%$ with 38~MB cumulative uplink communication, compared with 75~MB for 8-bit quantization, 86~MB for Top-$k$ sparsification, and 310~MB for uncompressed FedAvg. These results demonstrate the feasibility of applying GD to communication-efficient federated learning.

17.
arXiv (quant-ph) 2026-06-17

Closest Accessible Symmetry reduction: a tool for Hamiltonian interpolation analysis

arXiv:2606.18161v1 Announce Type: new Abstract: We introduce a framework for analysing the spectrum of Hamiltonian interpolations without heavily relying on discretising the interpolation parameter. The method is based on the concept of accessible symmetries: a problem-class-dependent family of certifiable reflections that induce bipartitions of the Hilbert space. At each step, the interpolation Hamiltonian is projected onto the sectors of the accessible symmetry that is closest to being satisfied, yielding a hierarchy of weakly coupled pseudo-eigenspaces together with explicit residual couplings between them. We show that this representation captures qualitative signatures of quantum phase transitions, provides estimates of their location, and offers insights into their nature. The quality of the approximation is controlled by the compatibility between the accessible symmetry family and the problem instance. Although motivated in spirit by adiabatic quantum computation, our approach applies more broadly to the study of Hamiltonian phase diagrams, providing a new perspective on the spectral reorganisation of many-body quantum systems.

18.
arXiv (CS.CL) 2026-06-11

ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation

De novo protein generation has transformative potential in therapeutic design, enzyme engineering, and synthetic biology. While diffusion-based and flow matching approaches have achieved progress, they typically operate at single resolution and lack mechanisms for incorporating functional constraints. We introduce ProHiFlo, a hierarchical flow matching framework with three innovations: (1) coarse-to-fine generation that models backbone geometry before refining to all-atom coordinates, reducing computational cost while maintaining accuracy; (2) functional guidance leveraging pretrained predictors to steer generation toward desired properties without retraining; (3) adaptive SE(3)-equivariant architecture for efficient multi-scale processing. Experiments on unconditional generation, motif scaffolding, and functional design demonstrate state-ofthe-art performance while requiring 4 fewer sampling steps. On enzyme active site scaffolding, ProHiFlo achieves 58.9% success rate compared to 41.2% for RFDiffusion.

19.
arXiv (CS.LG) 2026-06-18

Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds

作者:

arXiv:2606.18306v1 Announce Type: new Abstract: Gaussian width is a central geometric complexity measure in high-dimensional probability, compressed sensing, convex optimization, and learning theory. It quantifies the average extent of a set along random directions, thereby capturing the effective dimension of constraint sets, hypothesis classes, and descent cones. However, this notion is intrinsically Euclidean. Statistical models instead carry a natural Riemannian geometry induced by the Fisher information metric, where directions are scaled according to statistical distinguishability rather than ambient Euclidean length. We introduce Fisher width, a Fisher-geometric analogue of Gaussian width for statistical manifolds. At a parameter point $\theta$, Fisher width replaces the Euclidean identity by the local metric tensor $G(\theta)^{1/2}$, measuring the Gaussian width of the Fisher-rescaled set. This makes the resulting quantity sensitive to local statistical curvature and invariant under smooth reparameterizations. We develop the basic theory of Fisher width, showing that it retains key structural features of Gaussian width, including concentration, metric perturbation stability, and spectral comparison bounds with the Euclidean baseline, while also capturing anisotropic geometric effects invisible to Euclidean measures. As an application, we prove a generalization bound for Fisher-Lipschitz hypothesis classes and propose computable estimators, which we evaluate empirically on MNIST across three model classes. Fisher width is to statistical manifolds what Gaussian width is to Euclidean convex bodies. This work lays the foundation for studying complexity and learning on curved statistical manifolds.

20.
arXiv (CS.AI) 2026-06-19

A Neuromorphic Reinforcement Learning Framework for Efficient Pathfinding in Robotic Mobile Fulfillment Systems

arXiv:2606.20031v1 Announce Type: cross Abstract: Dynamic environmental changes, confined workspaces, and stringent real-time constraints make pathfinding in Robotic Mobile Fulfillment Systems (RMFS) a challenging problem for conventional search- and rule-based methods, which typically suffer from high computational complexity and long decision latency. While reinforcement learning (RL) has emerged as a powerful alternative, deploying learned policies with extreme energy efficiency on resource-constrained hardware remains an open challenge. We present SDQN-RMFS, an end-to-end framework that achieves high-fidelity deployment of an RL-trained policy from a full-precision artificial neural network (ANN) through to a neuromorphic chip. By computing only when triggered by sparse events, this framework unlocks ultra-low-power RMFS pathfinding. Our full-stack pipeline operates as follows: an ANN policy is first efficiently trained via a collision-allowing strategy to densify informative trajectories, and then converted into a spiking neural network (SNN) via a hard-label knowledge distillation approach. This effectively addresses the output distribution mismatch, preserving policy capability across the ANN-to-SNN pipeline while substantially reducing inference latency. Hardware experiments demonstrate up to 11,281$\times$ energy savings and a nearly two-fold reduction in latency compared to a high-performance GPU baseline, while maintaining decision quality on par with the original trained policy. These results establish physical neuromorphic inference as a practical and energy-sustainable pathway for large-scale RMFS operations.

21.
arXiv (CS.CL) 2026-06-15

ADORE: Iterative Query Expansion with Retrieval-Grounded Relevance Feedback

LLM-based query expansion improves retrieval by enriching the original query with additional context. Yet most methods remain generation-driven, producing plausible pseudo-documents or expansions without checking how the target corpus responds. This can introduce retrieval drift, amplify misleading vocabulary, or miss terms that distinguish relevant from non-relevant documents. We argue that effective expansion requires retrieval-grounded feedback, not just single-pass generation or unverified iteration. We introduce ADORE (ADapt, Observe, Relevance Evaluate), an iterative framework that turns retrieval outcomes into feedback for the next expansion. At each round, an LLM generates pseudo-passages, a retriever exposes the corpus response, and a relevance assessor evaluates retrieved documents against the original query. These judgments identify what to reinforce, what remains undercovered, and what to suppress. Across TREC Deep Learning, BEIR, and BRIGHT, ADORE consistently outperforms strong query expansion baselines with notable improvements across nearly all evaluation settings, improving average nDCG@10 by 24.5% over BM25 and 3.6% over the strongest prior query expansion method on BEIR, and by 122.9% over BM25 and 9.2% over the best query expansion baseline on BRIGHT. Our code and data are publicly available.

22.
arXiv (CS.CV) 2026-06-12

ViPER: Vision-based Packing-Aware Encoder for Robust Malware Detection

Visualization-based malware detection maps raw binary bytes to grayscale images and applies learned visual classifiers, providing an evasion-resistant and disassembly-free alternative to conventional analysis pipelines. However, executable packing remains a critical failure mode: packed binaries produce high-entropy images that obscure the structural patterns these models rely on. Because packing is also prevalent in benign software (e.g., for compression or copy protection), packing state alone is not a reliable indicator of maliciousness, and existing approaches do not address this challenge within a unified supervised framework. We present ViPER, a Vision-based Packing-Aware Encoder for Robust malware detection. ViPER builds on a LoRA-adapted ViT-B/14 backbone with a dual-head architecture that jointly learns malware classification and packing detection. A packing-aware gating mechanism conditions malware predictions on the inferred packing state, enabling distinct decision boundaries for packed and unpacked inputs. To address packing label skew during training, we employ frequency-weighted losses with stratified sampling over joint class-packing strata. Evaluated on 200,000 Windows PE byteplot images, ViPER achieves a balanced accuracy of 0.8521, ROC-AUC of 0.9260, and AUPR of 0.9279, outperforming representative state-of-the-art baselines across all primary metrics, while attaining a packing detection AUC of 0.9949.

23.
arXiv (CS.CV) 2026-06-16

Timestep Rescheduling in Diffusion Inversion

Diffusion inversion, which maps images back to the Gaussian latent space of a diffusion model, is a critical task for image reconstruction and editing. While DDIM enables fast deterministic inversion, it inherently introduces deviations that accumulate into noticeable inversion errors. Existing methods often address this by solving a fixed-point problem but largely overlook how the selection of the diffusion timestep in the noise scheduler influences inversion fidelity. In this work, we reveal that the deviation scale in diffusion inversion is strongly dependent on the timestep size, and exhibits a parabolic trend, with larger errors concentrated at both small and large timesteps. Based on this finding, we propose a simple yet effective nonuniform timestep scheduler that integrates a global rescaling with a local dynamic programming based rescheduling, enabling a strategic allocation of computational effort that minimizes the overall inversion error and preserves higher inversion accuracy. Our method serves as an off-the-shelf enhancement for existing inversion techniques and requires no extra parameters or computational overhead. Through extensive experiments, we verify that integrating our scheduler consistently boosts the performance of existing inversion methods, achieving superior results in image reconstruction and editing.

24.
arXiv (CS.AI) 2026-06-16

Continuous Cross-Domain Traffic State Prediction via Memory-Augmented Graph Liquid Time-Constant Networks

arXiv:2606.15807v1 Announce Type: cross Abstract: Traffic state prediction is a fundamental task in intelligent transportation systems. In practical applications, some regions suffer from limited traffic observations due to insufficient sensing infrastructure, making cross-domain knowledge transfer an important solution for data-scarce traffic prediction. However, existing cross-domain traffic prediction methods still face several limitations, including coarse-grained source-target adaptation, limited capability in handling unseen target-domain patterns, and insufficient modeling of continuous traffic dynamics under irregular or heterogeneous temporal conditions. To address these issues, this paper proposes a continuous cross-domain traffic prediction framework, termed Memory-Augmented Graph Liquid Time-Constant Network (MA-GLTC). Specifically, we first construct spatio-temporal units (STUs) to decompose traffic networks into transferable local units, enabling fine-grained knowledge alignment across domains. Then, a graph liquid time-constant network (GLTC) is developed to model graph-coupled traffic evolution in continuous time. Different from generic graph neural ODE-based models, GLTC introduces graph-coupled recurrent conductance into liquid time-constant dynamics, allowing node states to evolve with leakage, adaptive time constants, and neighborhood-aware feedback. Furthermore, a Memory-based Transfer Storage (MTS) mechanism is designed to preserve source-domain knowledge, retrieve matched traffic patterns, and update reliable target-domain patterns when unseen states emerge. Experiments on five public traffic datasets demonstrate that MA-GLTC consistently outperforms representative innerdomain and cross-domain baselines in both short-term and longterm prediction tasks. Compared with the second-best method, MA-GLTC reduces the average prediction errors by 3.02%, 0.33%, 8.92%, 10.09%, and 2.11%, respectively.

25.
arXiv (CS.CL) 2026-06-19

CREDENCE: Claim Reduction for Decomposition & Enhanced Credibility – Semantic Metrics and Convergence Analysis

Decomposing compound sentences into atomic, verifiable claims is a prerequisite for reliable automated fact-checking. Prior work has relied on token-overlap (Jaccard) metrics that systematically underestimate decomposition quality for paraphrastic claims, and has lacked formal termination analysis for the repair loop. We present Credence, a revised claim decomposition and evaluation framework addressing both shortcomings. Our contributions are: (1) Semantic-F1: we use BGE-large cosine similarity fidelity metric that resolves Jaccard's penalisation and improves downstream fact-checking accuracy; (2) Convergence theorems: we formally characterise four properties of the repair pipeline, establishing that rule-based repair is monotone and finitely terminating under an oracle parser assumption; LLM-based self-repair is provably non-monotone and requires an early-exit guard; (3) Three evaluation benchmarks spanning social-media, encyclopaedic, and news domains for cross-domain generalisation measurement; (4) Multi-model benchmarking across four decomposer models (3.8B-12B) and a closed API model. Experiments on SocialClaimSplit, WikiSplitBench, and ClaimDecompBench show that Semantic-F1 outperforms Jaccard-F1 by +15-32pp. EPR ranges from 0.94 to 1.00 on SocialClaimSplit and WikiSplitBench, while ClaimDecompBench includes lower base EPR cases (down to 0.824) due to harder news-domain constructions, and rule-repair reduces the Atomicity Violation Rate (AVR) by 47-100% relative to the base model without degrading fidelity.