Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-15

Where Black-box Drug-Target Interaction Prediction Models Look: Cross-Method Explainability

arXiv:2606.14245v1 Announce Type: new Abstract: Drug-target interaction (DTI) and affinity (DTA) predictors increasingly achieve strong benchmark scores, yet their internal use of sequence, fingerprint, and graph features often remains opaque. We present an interpretability audit of BridgeDPI architecture on three different datasets including Gao, Human, and C.elegans. This study combines gradient-based attributions – integrated gradients, saliency, layer-wise relevance propagation, SmoothGrad, and SmoothGrad-IG – with feature-wise occlusion ablation and strict intersection consensus across methods to reduce single-explainer bias. We summarize sensitivity and signed effects at raw inputs, at the bridge similarity scaffold, and through the graph convolution, including edge-level sensitivities and targeted edge removals. The results show that explainability is most informative when treated as model criticism: it reveals modality dominance, padding and special-token artifacts, dataset-dependent cooperative versus suppressive effects across layers, and chemistry-consistent fragment and composition motifs where methods agree. These analyses do not substitute for structural or experimental ground truth, yet they can provide testable hypotheses for downstream validation in computational drug discovery pipelines. More broadly, applying modern XAI to contemporary DTI/DTA models is still an early pass over the rich structure implicit in trained weights and data – yet even this first layer of scrutiny already helps researchers relate predictions to drug- and target-side representations and to prioritize external validation.

02.
arXiv (CS.LG) 2026-06-19

Weighted Bayesian Conformal Prediction

arXiv:2604.06464v2 Announce Type: replace Abstract: Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees, and recent work by Snell \& Griffiths reframes it as Bayesian Quadrature (BQ-CP), yielding powerful data-conditional guarantees via Dirichlet posteriors over thresholds. However, BQ-CP fundamentally requires the i.i.d. assumption. Meanwhile, weighted conformal prediction handles distribution shift via importance weights but remains frequentist, producing only point-estimate thresholds. We propose Weighted Bayesian Conformal Prediction (WBCP), which generalizes BQ-CP to arbitrary importance-weighted settings by replacing the uniform Dirichlet $\Dir(1,\ldots,1)$ with a weighted Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$, where $\neff$ is Kish's effective sample size. We prove four theoretical results: (1)~$\neff$ is the unique concentration parameter matching frequentist and Bayesian variances; (2)~posterior standard deviation decays as $O(1/\sqrt{\neff})$; (3)~BQ-CP's stochastic dominance guarantee extends to per-weight-profile data-conditional guarantees; (4)~the HPD threshold provides $O(1/\sqrt{\neff})$ improvement in conditional coverage. We instantiate WBCP for spatial prediction as Geographical BQ-CP, where kernel-based spatial weights yield per-location posteriors with interpretable diagnostics. Experiments on synthetic and real-world spatial datasets demonstrate that WBCP maintains coverage guarantees while providing substantially richer uncertainty information.

03.
arXiv (CS.CV) 2026-06-18

Posterior Continuation with Noise-Conditioned Frequency Exposure for Diffusion Inverse Problems

Diffusion posterior sampling solves inverse problems by combining a pretrained diffusion prior with measurement-consistency guidance. However, full-band guidance can be unreliable at high noise levels, where clean estimates contain score-induced errors and high-frequency measurement directions are weakly identifiable. We argue that posterior guidance should expose measurement frequencies according to the instantaneous diffusion noise level. Based on this principle, we propose a posterior continuation framework that constructs a family of intermediate posteriors whose likelihood emphasizes currently reliable frequency bands and gradually returns to full-band consistency. We instantiate this framework with a stabilized sampler that combines a diffusion predictor, frequency-limited likelihood refinement, and a Haar-domain commitment rule that commits reliable coarse corrections while deferring weakly identifiable details. Across super-resolution, inpainting, and deblurring, our method achieves competitive-to-state-of-the-art restoration performance, including up to 5 dB PSNR improvement on motion deblurring over strong baselines in evaluations on FFHQ and ImageNet.

04.
arXiv (CS.AI) 2026-06-16

ArtBoost: Synthetic Articulatory Data Augmentation for Acoustic-to-Articulatory Inversion

arXiv:2606.16327v1 Announce Type: cross Abstract: Recent acoustic-to-articulatory inversion (AAI) models rely on electromagnetic articulography (EMA) data, which are costly and limited in scale. To address this limitation, we propose ArtBoost, a novel data augmentation strategy that leverages large-scale speech–mesh datasets originally developed for speech-driven 3D facial animation to improve AAI under limited EMA supervision. ArtBoost extracts pseudo articulatory trajectories from visible facial anchors and uses them for pre-training before fine-tuning on real EMA data. Experiments show consistent improvements in PCC and RMSE. Trajectory analyses confirm that the pseudo articulatory signals reflect physically meaningful visible articulatory dynamics. Additional evaluations across different AAI architectures demonstrate stable performance gains, indicating that ArtBoost can be integrated into diverse AAI models. These results suggest that speech–mesh data provide an effective and scalable source of articulatory supervision for AAI. Project page: https://cau-irislab.github.io/Interspeech26-ArtBoost/

05.
arXiv (quant-ph) 2026-06-17

Quantum mechanics in configuration space in context

arXiv:2606.17622v1 Announce Type: new Abstract: To enhance the way in which wave-particle duality is implemented in the modelling of quantum mechanical systems, Bukhari et al. [New J. Phys. 27, 084501 (2025)] recently introduced an alternative approach to quantum mechanics, namely quantum mechanics in configuration space. This formalism is based on a physically motivated quantisation of Newtonian mechanics and promotes the classical position-velocity states (x,v) to pairwise distinguishable quantum states. The resulting |x,v> states form the basis of the Hilbert space of individual quantum mechanical particles and evolve along classical trajectories. In this paper, we consider the modelling of a mechanical particle in free space and put quantum mechanics in configuration space into context. It is shown that this formalism increases the continuity between quantum and classical mechanics by avoiding a conceptual inconsistency associated with the definition of momentum in canonical quantisation. In addition, we emphasise that standard quantum mechanics and quantum mechanics in configuration space are based on two distinct formulations of classical mechanics.

06.
arXiv (CS.AI) 2026-06-12

Once-for-All: Scalable Simultaneous Forecasting via Equilibrium State Estimation

arXiv:2606.13285v1 Announce Type: cross Abstract: We introduce Equilibrium State Estimation (ESE), a novel paradigm for simultaneous prediction, where multiple interacting systems require separate yet coordinated forecasts. Such scenarios often arise in real-world settings such as economics and healthcare modeling. Unlike existing approaches that predict one system at a time, ESE forecasts all systems in a single pass. It first estimates the equilibrium state across systems, then generates holistic forecasts based on the difference between the current state and the estimated equilibrium. Extensive experiments on synthetic and real-world datasets, including currency exchange and COVID-19 spread modeling, demonstrate that ESE is at least as accurate as state-of-the-art (SOTA) methods while being significantly faster. In addition, ESE integrates seamlessly with conventional predictors, combining their accuracy with its exceptional efficiency and delivering a 10-70x speedup. With linear-time complexity, ESE scales far better than SOTA methods as the number of systems increases. Moreover, it remains accurate under diverse perturbations, establishing ESE as a fast, generalizable, robust, and scalable multi-prediction method.

07.
arXiv (CS.AI) 2026-06-16

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

arXiv:2604.02343v2 Announce Type: replace-cross Abstract: We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is possible at the cost of more compute. For lossless compression, domain-adapted LoRA adapters can improve LLM-based arithmetic coding by 2x over compression with the base LLM alone. For lossy compression, prompting a model for a succinct rewrite then applying arithmetic coding can achieve compression ratios of approximately 0.03, a 2x improvement over compressing the original response. We further introduce Question-Asking compression (QA), an interactive lossy protocol inspired by the game 'Twenty Questions'. A small model iteratively refines its response by asking yes/no questions to a stronger model, transferring exactly one bit per answer. On 8 benchmarks spanning math, science, and code, 10 binary questions recover 23% to 72% of the capability gap between a small and large model on standard benchmarks and 7% to 38% on harder benchmarks, achieving compression ratios of 0.0006 to 0.004. This is over 100x smaller than prior LLM-based compression (Deletang et al., 2024), suggesting that interactive protocols can transfer knowledge far more efficiently than transmitting full responses.

08.
arXiv (quant-ph) 2026-06-16

Quantum coherence and Leggett-Garg inequality

arXiv:2606.15717v1 Announce Type: new Abstract: In this paper, we attempt to establish the relationship between quantum coherence and the violation of the Leggett-Garg inequality. In particular, employing the Lindblad equation, we obtain the pseudo-density matrix for a damping system to study the effect of environment interaction on the violation of this inequality in a two-state quantum system. It is shown that the violation of the Leggett-Garg inequality can be observed as long as temporal evolution does not induce decoherence. This statement is independent of the initial state of the system. Furthermore, similar to the Horodecki criterion for the CHSH inequality (R. Horodecki et al. Phys. Lett. {\bf A200}, 340), we study necessary and sufficient conditions for violating the Leggett-Garg inequality. Hereby, under the circumstance that the inequality violation occurs, an upper bound for the time interval between consecutive measurements with respect to the time scale of interaction with the environment (the relaxation time) is obtained.

09.
arXiv (CS.CV) 2026-06-16

Object Tokens as a Bridge Between Segmentation and Visual Question Answering in Robotic Surgery

Visual Question Answering (VQA) in robotic surgery, referred to as surgical VQA, requires high-level understanding of complex surgical scenes and the integration of visual perception with language reasoning, with the potential to support surgical training and intraoperative decision-making. Recent Vision-Language Models (VLMs) have shown promising performance through parameter-efficient fine-tuning; however, most existing approaches rely on coarse visual grounding, typically limited to bounding boxes, which fails to capture the fine-grained spatial structure of surgical objects. In this work, we propose a unified framework that jointly performs pixel-level segmentation and visual question answering within a single framework. Our approach integrates a VLM with a Segment Anything Model (SAM)-based decoder and represents scene elements as object tokens generated by the VLM. These object tokens guide answer prediction and are further projected to the SAM-based decoder to produce segmentation masks. By optimizing the object token embeddings through both segmentation and question answering objectives, the model learns spatially grounded representations that enhance visual reasoning while providing explicit pixel-level grounding. We evaluate the proposed method on the private RAMIE (Robot-Assisted Minimally Invasive Esophagectomy) dataset and the public EndoVis18 dataset, where it consistently outperforms baseline methods for surgical VQA. These results demonstrate that incorporating context-aware object tokens into vision-language models improves fine-grained surgical scene understanding.

10.
arXiv (CS.AI) 2026-06-19

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

arXiv:2606.19602v1 Announce Type: new Abstract: Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete. Standard retrieval-augmented generation fails on this data, mishandling temporal reasoning, cross-document dependencies, and missing metadata. We deploy ACIE (Agentic Clinical Information Extraction) at University Medicine Essen: an on-premise agentic RAG pipeline that reasons over complete patient contexts and grounds every answer in source passages for clinician verification. We quantify the metadata gap, trace the architectural decisions it shaped, and evaluate extraction alongside an independent retrospective lymphoma registry study, in which nuclear-medicine physicians verify every extracted value against its cited sources. Across 7,326 judgments, clinicians accepted 96.5\% of extractions, with per-type acceptance ranging from 80\% to 99\%.

11.
arXiv (quant-ph) 2026-06-19

Accelerated Rydberg electromagnetically induced transparency quantum memory via shortcuts to adiabaticity

arXiv:2603.18399v2 Announce Type: replace Abstract: Electromagnetically induced transparency (EIT) enables coherent light-matter storage, forming the basis of photonic quantum memories that are essential for scalable quantum networks and distributed quantum computing. However, accelerating the storage process violates the adiabatic condition, resulting in the excitation of the lossy intermediate state and a reduction in writing efficiency. We propose and numerically investigate a high-speed, high-fidelity quantum storage scheme by incorporating a shortcut-to-adiabaticity (STA) technique based on counter-diabatic (CD) driving. By introducing a precisely engineered auxiliary field into a conventional EIT system, our protocol significantly shortens the writing time beyond the conventional adiabatic limit while effectively suppressing the transient population of the lossy intermediate state. Furthermore, our scheme demonstrates strong flexibility in pulse design, remaining effective across different temporal profiles of both the control and signal fields. It also exhibits robustness against imperfections in the CD drive. Even with imperfect single-photon writing and non-ideal Rydberg blockade, the scheme retains clear advantages, maintaining high storage performance and overcoming the intrinsic speed-fidelity trade-off of traditional EIT protocols. These features pave the way for fast and robust quantum devices suitable for high-throughput quantum repeaters and advanced quantum information processing.

12.
arXiv (CS.AI) 2026-06-15

FAConformer: Frequency-Aware Convolutional Transformer for Auditory Attention Decoding

arXiv:2606.14120v1 Announce Type: cross Abstract: Auditory attention decoding (AAD) aims to infer the attended speaker from neural responses in multi-speaker acoustic environments and is a key problem for neuro-steered hearing systems. Although recent studies have achieved encouraging progress, existing AAD models still do not fully exploit frequency domain electroencephalography (EEG) information. In particular, most approaches introduce multi-band information through handcrafted feature extraction or direct cross-band feature concatenation, which mainly exploit frequency information at a shallow level and may overlook band-specific patterns and cross-band interactions. To address these limitations, this paper proposes FAConformer, a frequency-aware CNN-Transformer framework for AAD that explicitly integrates band-specific encoding and adaptive cross-band interaction. Specifically, FAConformer first decomposes EEG signals into multiple frequency bands and assigns each band to an independent CNN-Transformer encoder for band-specific modeling. The resulting band-wise features are then adaptively fused by a carefully designed frequency-aware attention (FAA) module that models cross-band dependencies by treating band-wise features as tokens. Further, band-wise auxiliary supervision (BAS) is introduced to prevent weakly contributing branches from being under-optimized during joint training. In this way, FAConformer performs frequency-aware modeling that more effectively exploits frequency domain information. Extensive experiments on two public AAD datasets with three decision-window lengths demonstrated that FAConformer consistently outperformed 12 competitive baselines, surpassing the current state-of-the-art model by 4.9%. Further analyses of band importance, ablation, and parameter sensitivity verify the effectiveness, robustness, and interpretability of the proposed framework. Code is available at https://github.com/wzwvv/FAConformer.

13.
arXiv (CS.CV) 2026-06-19

GenTrack2: An Improved Hybrid Approach for Multi-Object Tracking

This paper proposes a visual multi-object tracking method that jointly employs stochastic and deterministic mechanisms to ensure identifier consistency for unknown and time-varying target numbers under nonlinear dynamics. A stochastic particle filter addresses nonlinear dynamics and non-Gaussian noise, with support from particle swarm optimization (PSO) to guide particles toward state distribution modes and mitigate divergence through proposed fitness measures incorporating motion consistency, appearance similarity, and social-interaction cues with neighboring targets. Deterministic association further enforces identifier consistency via a proposed cost matrix incorporating spatial consistency between particles and current detections, detection confidences, and track penalties. Subsequently, a novel scheme is proposed for the smooth updating of target states while preserving their identities, particularly for weak tracks during interactions with other targets and prolonged occlusions. Moreover, velocity regression over past states provides trend-seed velocities, enhancing particle sampling and state updates. The proposed tracker is designed to operate flexibly for both pre-recorded videos and camera live streams, where future frames are unavailable. Experimental results confirm superior performance compared to state-of-the-art trackers. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack2

14.
arXiv (CS.AI) 2026-06-12

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

arXiv:2606.12838v1 Announce Type: cross Abstract: Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

15.
medRxiv (Medicine) 2026-06-22

Development of a Novel Risk Prediction Model for Rheumatoid Arthritis-Associated Interstitial Lung Disease (RA-ILD): A Longitudinal Study

Background: Interstitial lung disease (ILD) is one of the most common and potentially most devastating extra-articular complication of rheumatoid arthritis (RA) and is associated with substantial morbidity and mortality. However, reliable tools for the early identification of ILD in patients with RA remain limited. This study aimed to identify plasma protein biomarkers of RA-ILD and develop an interpretable machine learning model for risk prediction using data from the UK Biobank. Methods: We first evaluated the association between baseline RA and the risk of incident ILD in the UK Biobank using Cox proportional hazards models. Mendelian randomization analysis was then performed to investigate the potential causal relationship between RA and ILD. Finally, we analyzed 2,920 plasma proteins measured using the Olink platform in 781 eligible RA patients. Proteins associated with ILD risk were identified using Cox proportional hazards models and subsequently used to construct eight machine learning models. Model performance was assessed using the receiver operating characteristic curve (ROC) and decision curve analysis. The best-performing model was further interpreted using Shapley additive explanations (SHAP) to evaluate feature importance. Results: Compared with participants without RA, Patients with baseline RA had a significantly higher risk of developing ILD (Hazard ratio: 4.425, 95% CI: 3.549,5.518). The MR supported a potential causal association between RA and ILD (Odds ratio: 1.227, 95% CI: 1.121,1.343). Among the eight machine learning models, the CatBoost model showed the best performance, achieving an area under the curve (AUC) of 0.884 (95% CI: 0.773,0.996). The SHAP analysis identified LAG3, NPC2, and LAMP3 are the three most important plasma protein predictors of ILD development in patients with RA. Conclusion: Plasma proteomics combined with machine learning may provide a promising approach for identifying biomarkers and predicting ILD risk in patients with RA. LAG3, NPC2, and LAMP3 may serve as candidate biomarkers for RA-ILD and warrant further validation. Keywords: Rheumatoid arthritis, Interstitial lung disease, Mendelian randomization, Machine learning, Plasma proteins.

16.
arXiv (CS.LG) 2026-06-18

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

arXiv:2606.19149v1 Announce Type: cross Abstract: Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

17.
arXiv (CS.AI) 2026-06-18

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

arXiv:2601.21626v2 Announce Type: replace-cross Abstract: Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

18.
medRxiv (Medicine) 2026-06-22

The direct economic impact of surgical non-response in orthopaedic hip, knee, and spine surgery for osteoarthritis: a cost-utility analysis

Background Annually, nearly 2 million hip, knee, and spinal inpatient surgeries are performed in Canada and the US for osteoarthritis (OA), costing over $37 billion in hospital expenditures. However, 15-30% of patients experience limited or no improvement, resulting in poor value for money. This study evaluated the one-year cost-utility of joint and spine procedures for OA by comparing non-responders to responders, considering various responder definitions. Methods Individual micro-costing data were collected for 1,175 elective hip, knee, and spine patients enrolled in the Longitudinal Evaluation in the Arthritis Program - Osteoarthritis (LEAP-OA) between 2014 and 2018. Quality-adjusted life years (QALYs) were derived using the SF-6D utility index. One-year incremental cost-utility ratios (ICURs) were calculated from the hospital perspective. Results Responder rates varied by definition, ranging from 78%-94% for hip replacements, 64%-90% for knee replacements, 60%-64% for spine fusions, and 50%-68% for spine decompressions. Corresponding ICURs were: $45,956-$51,773/QALY for responders versus $108,593-$485,762/QALY for non-responders for hip replacements; $54,831-$71,151/QALY for responders versus $200,486-$1,203,596/QALY for non-responders for knee replacements; $65,980-$74,422/QALY for responders versus $262,039-$729,686/QALY for non-responders for spine fusions; and $29,947-$42,168/QALY for responders versus $63,195-$662,586/QALY for non-responders for spine decompressions. Conclusions While surgical response rates were highly dependent on the responder definition, ICURs for non-responders were significantly higher than those for responders across all definitions. Beyond the negative impact on patients, there is a compelling economic argument for investment in improved pre-operative identification of patients at risk of surgical non-response. Such efforts could enable more personalized, value-based care pathways and reduce the provision of low-value surgical interventions.

19.
arXiv (CS.AI) 2026-06-19

Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation

arXiv:2606.20135v1 Announce Type: cross Abstract: Flow matching has emerged as a standard paradigm for robotic manipulation owing to its strong expressive power for modelling complex, multimodal action distributions, alongside similar approaches like diffusion policy. However, existing methods rely on discretized action chunks, making them brittle to demonstrations collected at heterogeneous control frequencies and prone to temporally inconsistent actions that degrade control stability. In this paper, we propose Frequency-Aware Flow Matching (FAFM), which outputs continuous, temporally consistent actions. To handle heterogeneous frequency input, we transform discrete action sequences into the frequency domain with the discrete cosine transform (DCT), perform flow matching over the resulting coefficients, and reconstruct continuous actions via cosine basis expansion. To generate temporally consistent actions, we regularize the first-order temporal derivative to promote smooth actions. This corresponds to a Sobolev-type constraint that suppresses high-frequency errors and discourages abrupt action changes. Our FAFM is simple, introduces no additional network parameters and applies to standalone flow-matching policies and vision-language action models. Across synthetic toy benchmark, obstacle avoidance, LapGym, and LIBERO, FAFM improves success rates, multimodal expressivity, motion smoothness, convergence speed, robustness to mechanical bias and mixed-frequency input. These gains are consistent when deployed on a real-world Franka robot. Code available at https://anonymous.4open.science/r/FAFM.

21.
arXiv (CS.AI) 2026-06-18

CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework

arXiv:2606.18385v1 Announce Type: new Abstract: Vision-Language Models (VLMs) remain prone to hallucinations, producing fluent but visually unfaithful outputs. Existing chain-of-thought and retrieval-augmented methods only partially address this, as they neither enforce step-level citation grounding nor route verification failures back to retrieval for correction. We present CaVe-VLM-CoT, a modular reflection-based agentic-RAG framework that enforces evidence-grounded reasoning through a five-stage closed-loop pipeline: Extractor, Retriever, Solver, Citation Injector, and Verifier, in which detected ungrounded claims trigger structured feedback to the Extractor for targeted re-retrieval. Since no existing framework jointly measures retrieval quality, step-wise citation faithfulness, and cross-modal grounding, we propose a suite of 23 component-wise metrics across all stages, anchored by CaVeScore, a composite metric weighting accuracy, citation precision and recall, attribution, and evidence grounding. Without any architectural or prompt modifications, CaVe-VLM-CoT achieves 87.1\% accuracy and 56.6\% CaVeScore on ScienceQA , and 55.2\% accuracy and 35.7\% CaVeScore on MMMU (30 subjects).

22.
arXiv (math.PR) 2026-06-16

Convergence to the Brownian CRT for critical branching Markov processe

arXiv:2601.05906v2 Announce Type: replace Abstract: We prove an invariance principle for a general class of continuous time critical branching processes with finite variance (non-local) branching mechanism. We show that the genealogical trees, viewed as random compact metric measure spaces, converge under rescaling to the Brownian continuum random tree in the Gromov-Hausdorff-weak topology, establishing a universal scaling limit for critical finite variance branching processes.

23.
bioRxiv (Bioinfo) 2026-06-16

Accelerating String Comparison in RLZ Compressed Sequences via LCE Jumps

Relative Lempel-Ziv (RLZ) is an effective compression method for large, repetitive collections; however, the fundamental primitives required to elevate it from a passive archival format to a tractable representation for compressed construction have yet to be fully established. In this paper, we introduce an algorithmic framework for structurally comparing and lexicographically sorting sequences of RLZ factors. We characterize when direct factor comparisons are necessary and when they can be bypassed using RLZ specific shortcuts. We further introduce a method for extending truncated factors into right-maximal matches, enabling the recovery of matching statistics from the RLZ parse. Experimentally, RLZ sorting achieved speedups of up to 3.93x over character-based sorting. Together, these results advance the use of the RLZ format as a foundation for compressed construction.

24.
arXiv (CS.CV) 2026-06-19

Through the PRISM: Preference Representation in Intermediate States of Video Diffusion Models

Evaluating video generation with clean, pixel-based reward models disconnects evaluation from the noisy diffusion process and incurs massive VAE decoding costs. In this paper, we challenge this paradigm by asking a fundamental question: Can a powerful video generator inherently discriminate preferences directly from noisy latents? To answer this, we introduce PRISM (Preference Representation in Intermediate States of Diffusion Models). PRISM employs a lightweight Query-based Aggregation head with a frozen video diffusion backbone to decode preference signals from noisy latents. Surprisingly, PRISM not only achieves SOTA preference accuracy but also unlocks strong noise-robustness, which enables early-stage Best-of-$N$ sampling. This allows for filtering suboptimal candidates at the very beginning of denoising, drastically reducing computation while boosting video quality. We also reveal a strong positive correlation between a backbone's generative performance and its inherent evaluative power, enabling self-improving video backbones.

25.
arXiv (CS.CL) 2026-06-11

Improving Cross-Format Robustness in Language Models with Multi-Format Training

Large language models often remain sensitive to answer format: a question solved correctly in one form may fail in another semantically equivalent form. To study this gap, we define cross-format robustness as the extent to which a model answers the same underlying question consistently across formats. We then compare full-format training with FormatMix, which expands only a subset of training items into multiple equivalent formats using either random or targeted selection. Across GLM4 and Llama-3.1, multi-format supervision consistently improves both task performance and cross-format robustness, whereas Multiple-choice question (MCQ)-only supervision alone brings little benefit and can even reduce robustness. We further find that expanding only about 30% of the training set into multiple formats often recovers most of the gain from full-format training, and this effect appears across the model families and sizes we study. These results suggest that format diversity, rather than additional supervision alone, is the key driver of robustness. That lightweight multi-format augmentation is a practical way to make LLMs less sensitive to answer format without changing the base model.