Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-15

Learning to Hear Hesitation: Continual Learning for Disfluency-Aware ASR

Despite advances in large-scale Automatic Speech Recognition (ASR), disfluent speech remains challenging, as state-of-the-art systems are often optimized to omit disfluencies, leading to information loss and hallucinations. Prior work has focused on verbatim transcription and the integration of disfluency markers, but adapting models on limited datasets can lead to catastrophic forgetting of general-domain knowledge. We address this gap by leveraging continual learning (CL) with explicit disfluency tokens. We first introduce these tokens into a pretrained ASR model to establish stable token mechanisms, and then continue training on additional datasets with varying disfluency distributions. Through a detailed analysis of model dynamics during training, we identify a trade-off between marker learning and ASR performance, and a consistent cross-attention head mechanism shared across CL methods.

03.
Nature Biotechnology 2026-06-11

Large-scale, spatially resolved panoramic CRISPR screening in native tissue environments using Perturb-DBiT

作者:

Spatially resolved CRISPR screening in vivo has been limited to small perturbation panels and subsets of protein-coding RNAs. We present Perturb-DBiT, a method for co-sequencing of spatial total RNA whole transcriptomes and single guide RNAs (sgRNAs) on the same tissue section in situ. In a human cancer metastatic colonization model, we applied large (80,000+) sgRNA panels across tumor colonies in multiple consecutive tissue sections alongside their corresponding total RNA transcriptomes. We linked perturbations affecting long noncoding RNA covariation, microRNA–mRNA interactions and distinct amino acid-specific tRNA alterations to tumor migration and growth. By integrating transcriptional pseudotime trajectories, we further observed the impact of perturbations on clonal dynamics and cooperation. In an immune-competent syngeneic mouse model, investigation of the tumor immune microenvironment indicated distinct, synergistic effects on immune infiltration and suppression. Perturb-DBiT provides a spatially resolved comprehensive view of perturbation responses in complex tissues, including small and large RNA regulation, tumor proliferation, migration, metastasis and immune interactions. In vivo CRISPR genetic perturbations are spatially mapped at scale.

04.
arXiv (CS.AI) 2026-06-24

Low-power analogue neural networks with trainable nonlinear connections for continuous control

arXiv:2606.23742v1 Announce Type: cross Abstract: Physical neural networks promise low-power machine learning by computing directly with analogue device physics, but most architectures force nonlinear device responses to act as scalar weights. Inspired by Kolmogorov-Arnold networks, we place trainable nonlinear functions on the connections, making each physical connection a learnable computational element. Realising these functions as analogue band-pass filters on field-programmable analogue arrays, we find that the benefit is task-dependent and follows from the smoothness of the physical basis: the networks represent smooth, continuously valued targets, including robotic kinematics, continuous control, and photovoltaic maximum-power-point tracking, with far fewer nodes and connections than multilayer perceptrons, but offer no parameter-efficiency advantage on classification-like decision boundaries. Trained networks transfer to hardware across approximately 35,000 connections with quantified fidelity, and a dedicated CMOS implementation is projected to operate at approximately 30 microwatts. A memristive realisation reproduces the same behaviour in simulation, indicating that the advantage comes from placing trainable nonlinearity on connections, rather than from a particular device.

05.
arXiv (CS.LG) 2026-06-25

Onsager-Machlup Posterior Transport for Deep Gaussian Processes

arXiv:2605.23434v2 Announce Type: replace Abstract: Approximate inference over inducing variables is the central computational bottleneck of Deep Gaussian Processes (DGPs). Existing methods either fit an explicit density $q_\phi(\bU)$ by an ELBO (DSVI, IPVI, DDVI, DBVI) or sample by MCMC (SGHMC). We instead frame DGP inference as posterior transport: learn a deterministic sampler that maps a tractable reference measure to posterior-relevant inducing variables, regularised by a path prior derived from the Doob-bridged reference diffusion. Our realisation, OM-Path (formally FBVI-bridge-Path), uses Song's probability-flow ODE applied to DBVI's Doob-bridged forward SDE; the reference drift is closed-form from the bridge marginal coefficients (no score matching) and the path regulariser is the Onsager–Machlup action. At the finite-$\epsilon$ value used at training, the objective is the negative log unnormalised density of a tempered Doob-bridge path posterior, and Theorem 1 identifies it with the same posterior's small-noise MAP path via the Freidlin–Wentzell LDP. Two strict path-space ELBO variants on the same bridge backbone (FFJORD log-det; OM-regularised CNF) are derived as ablations. Under a matched-seed paired Wilcoxon test against DBVI on seven UCI regression benchmarks, OM-Path delivers statistically significant wins on the two largest datasets (power: $p\!=\!0.014$, NLL $\mathbf{0.012}$ matching the DSVI baseline of $0.017$; protein: $p\!=\!0.002$, RMSE $\mathbf{0.716}$ vs.\ $0.764$, NLL $\mathbf{1.086}$ vs.\ $1.149$), statistical ties on yacht / qsar, and concedes boston / energy / concrete to DBVI on small-$N$ noisy data. The strict-ELBO variants do not clear DBVI on any UCI metric: in this regime, reducing the variance of the path objective dominates exact-density tracking.

06.
arXiv (CS.CV) 2026-06-16

MIRAGE: Runtime Scheduling for Multi-Vector Image Retrieval with Hierarchical Decomposition

To effectively leverage user-specific data, retrieval augmented generation (RAG) is employed in multimodal large language model (MLLM) applications. However, conventional retrieval approaches often suffer from limited retrieval accuracy. Recent advances in multi-vector retrieval (MVR) improve accuracy by decomposing queries and matching against segmented images. They still suffer from sub-optimal accuracy and efficiency, overlooking alignment between the query and varying image objects and redundant fine-grained image segments. In this work, we present an efficient scheduling framework for image retrieval - MIRAGE. First, we introduce a novel hierarchical paradigm, employing multiple intermediate granularities for varying image objects to enhance alignment. Second, we minimize redundancy in retrieval by leveraging cross-hierarchy similarity consistency and hierarchy sparsity to minimize unnecessary matching computation. Furthermore, we configure parameters for each dataset automatically for practicality across diverse scenarios. Our empirical study shows that, MIRAGE not only achieves substantial accuracy improvements but also reduces computation by up to 3.5 times over the existing MVR system.

07.
arXiv (quant-ph) 2026-06-25

Maximal global device-independent randomness from projective measurements in every dimension

arXiv:2606.21369v2 Announce Type: replace Abstract: Device-independent random number generation (DIQRNG) is the most secure form of generating private randomness using quantum physical processes. Its strength lies in producing numbers that are impossible to predict by any eavesdropper restricted by the laws of quantum theory. Moreover, security is proven solely from observed measurement statistics, without the need to characterise or trust the devices used in random number generation. Implementing DIQRNG is, however, costly, as it requires high-quality entangled systems. It is therefore important to make the best use of available resources. In this work, we show that using projective measurements – which are most readily implementable experimentally – one can certify $2\log(d)$ bits of device-independent randomness from a bipartite system of local dimension $d$ for every $d \ge 2$, thus reaching the theoretically maximum possible rate of DIQRNG. We provide explicit protocols reaching $2\log(d)$ bits based on mutually unbiased bases. Furthermore, we compute numerical bounds on the rate for the case of imperfect implementations, showing that our protocols are robust to experimental noise.

08.
arXiv (CS.CV) 2026-06-24

DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates

We propose DLTPose, a novel method for 6DoF object pose estimation from RGBD images that combines the accuracy of sparse keypoint methods with the robustness of dense pixel-wise predictions. DLTPose predicts per-pixel radial distances to a set of minimally four keypoints, which are then fed into our novel Direct Linear Transform (DLT) formulation to produce accurate 3D object frame surface estimates, leading to better 6DoF pose estimation. Additionally, we introduce a novel symmetry-aware keypoint ordering approach, designed to handle object symmetries that otherwise cause inconsistencies in keypoint assignments. Previous keypoint-based methods relied on fixed keypoint orderings, which failed to account for the multiple valid configurations exhibited by symmetric objects, which our ordering approach exploits to enhance the model's ability to learn stable keypoint representations. Extensive experiments on the benchmark LINEMOD, Occlusion LINEMOD and YCB-Video datasets show that DLTPose outperforms existing methods, especially for symmetric and occluded objects. The code is available at https://anonymous.4open.science/r/DLTPose_/ .

09.
arXiv (CS.CL) 2026-06-11

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Speculative decoding (SD) addresses the high inference costs of LLMs by having lightweight drafters generate candidates for large verifiers to validate in parallel. Existing draft-verify methods use binary decisions: accept or fully recompute. Yet we find that many rejected tokens can be verified correctly by a slim submodel derived from the full verifier via intra-model routing, instead of the full verifier. This motivates our slim-verifier to handle tokens requiring moderate verification resources, reducing expensive large-model calls. We propose Verification via Intra-Model Routing for Speculative Decoding (VIA-SD), a multi-tier framework using a routed slim-verifier. Draft tokens are processed hierarchically: direct acceptance for high-confidence cases, slim-verifier regeneration for medium-confidence cases, and full-model verification for uncertain cases. Across four representative tasks and multiple model families, VIA-SD reduces rejection rates by 0.10-0.22 and delivers 10-20% speedups over strong SD baselines, while achieving 2.5-3x acceleration over non-drafting decoding. Moreover, VIA-SD is compatible with existing SD frameworks without modifying their training procedures. Our results suggest multi-tier SD as a general paradigm for scalable and efficient LLM inference. Project page: https://zju-xyc.github.io/VIA-SD-Project-Page/

10.
arXiv (math.PR) 2026-06-19

Finite-Sample Bounds for Expected Signature Estimation under Weak Dependence

arXiv:2605.20541v2 Announce Type: replace-cross Abstract: The expected signature uniquely determines the law of a random rough path under a moment-growth condition, yet finite-sample bounds for estimating its truncations from a single long dependent trajectory remain unavailable. We study a strictly stationary stochastic process equipped with a geometric rough-path lift, observed in non-overlapping blocks of equally-spaced samples, and prove a non-asymptotic mean-squared error (MSE) bound for the block-averaging estimator of its truncated expected signature. Under moment and stationarity assumptions together with a direct covariance-decay condition on block signatures – strictly weaker than $\alpha$-mixing and applicable to long-range-dependent processes – the error separates into a discretization term and a fluctuation term, with rates determined respectively by path regularity and dependence strength. A levelwise rough-factorial variance analysis keeps finite-truncation constants explicit and yields an optimal allocation rule under a fixed observation budget. We verify the assumptions for independent-coordinate fractional Ornstein–Uhlenbeck processes in three regimes: short-range (Hurst $1/41/2$. Monte Carlo experiments show empirical slopes steeper than the guaranteed upper-bound rates.

11.
arXiv (CS.CL) 2026-06-16

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interactions as token sequences over a common vocabulary. By representing spatial contact and constraint patterns as discrete tokens, the model captures complex structural interactions in a purely sequential manner, without relying on explicit coordinates or geometric neural networks. This unified representation enables a wide range of downstream tasks to be formulated consistently as next-token prediction in the same grammar space, creating strong alignment between continued multi-domain pre-training and downstream objectives. Across diverse tasks, LOGOS consistently matches or outperforms domain-specific baselines, providing preliminary evidence for the feasibility of "one model fits all" in the natural sciences. We train LOGOS models at different scales (1B, 3B, and 8B parameters) and find a consistent positive correlation between model size and performance. This suggests that the future of AI for Science (AI4S) may not lie in building an independent technical stack that is separated from large language models (LLMs). Instead, it may depend on deeply aligning scientific foundation models with LLMs through shared architectures, shared training paradigms, and shared inference infrastructure, so that LLMs can truly become a new entry point for AI4S. We release the model weights and associated resources to facilitate further research.

12.
medRxiv (Medicine) 2026-06-15

Scalable estimation of temporal clustering in accelerometry: a kernel-independent dispersion index grounded in the Hawkes process

Background. Self-exciting (Hawkes) point processes are a natural model for the temporal clustering of human physical activity (PA) recorded by accelerometers, yet they have seldom been used in this setting—in part because the usual maximum-likelihood fitting is challenging due to potential estimation bias and convergence failures on these data. A moment-based alternative—estimating the Hawkes branching ratio from the dispersion index, the variance-to-mean ratio of event counts—is kernel-independent and computationally trivial, but it has not been evaluated for accelerometry or adapted to the intensity-marked recordings accelerometers provide. Methods. Treating each minute above a sedentary threshold as an event, we estimated the Hawkes branching ratio $n$ by maximum likelihood and, as a kernel-independent and far cheaper alternative, from the dispersion index. We compared four dispersion-based estimators—event-count-based, intensity-mark-weighted using the mark-moment ratio, and time-of-day (TOD) adjusted variants of each—against the marked and unmarked maximum-likelihood estimates. Estimators were evaluated for mutual agreement, goodness of fit, and finite-window results in two National Health and Nutrition Examination Survey (NHANES) accelerometry cohorts (hip-worn, $n=2{,}560$; wrist-worn, $n=3{,}132$). We related the resulting temporal clustering measures to all-cause mortality using survey-weighted Cox models, adjusting for PA frequency, Peak30 (the average of the 30 highest PA values), and demographic covariates. Results. Event-count-based dispersion estimates agreed strongly with maximum-likelihood branching ratios ($rapprox0.74$ in both cohorts); the intensity-marked variant incorporating PA intensity variability agreed less well. Marked and unmarked Hawkes models yielded similar excitation and decay parameters, suggesting PA intensity added little clustering information beyond event timing. In the survival analysis, temporal clustering was associated with all-cause mortality independently of PA frequency and Peak30; the direction of association differed between the hip- and wrist-worn cohorts. Conclusions. A scalable dispersion-index estimator recovers the Hawkes branching ratio and matches maximum-likelihood estimates without requiring kernel specification or iterative optimization. It offers a practical tool for quantifying temporal clustering in accelerometry, enabling decomposition of temporal PA patterns into its exogenous initiation and endogenous persistence. Such temporal patterns carry health-relevant information beyond PA intensity and volume. Keywords: dispersion index; Hawkes process; branching ratio; temporal clustering; point process estimation; accelerometry; mortality

13.
arXiv (CS.LG) 2026-06-12

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

arXiv:2601.09693v3 Announce Type: replace Abstract: Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for predefined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

14.
arXiv (CS.CL) 2026-06-11

VietMed-MCQ: A Consistency-Filtered Data Synthesis Framework for Vietnamese Traditional Medicine Evaluation

Large Language Models (LLMs) have demonstrated remarkable proficiency in general medical domains. However, their performance significantly degrades in specialized, culturally specific domains such as Vietnamese Traditional Medicine (VTM), primarily due to the scarcity of high-quality, structured benchmarks. In this paper, we introduce VietMed-MCQ, a novel multiple-choice question dataset generated via a Retrieval-Augmented Generation (RAG) pipeline with an automated consistency check mechanism. Unlike previous synthetic datasets, our framework incorporates a dual-model validation approach to ensure reasoning consistency through independent answer verification, though the substring-based evidence checking has known limitations. The complete dataset of 3,190 questions spans three difficulty levels and underwent validation by one medical expert and four students, achieving 94.2 percent approval with substantial inter-rater agreement (Fleiss' kappa = 0.82). We benchmark seven open-source models on VietMed-MCQ. Results reveal that general-purpose models with strong Chinese priors outperform Vietnamese-centric models, highlighting cross-lingual conceptual transfer, while all models still struggle with complex diagnostic reasoning. Our code and dataset are publicly available to foster research in low-resource medical domains.

15.
arXiv (quant-ph) 2026-06-15

Efimov Effect in Ultracold Microwave-Shielded Polar Molecules

arXiv:2602.21433v2 Announce Type: replace-cross Abstract: A quantum-mechanical description is presented for the three-body physics of shielded dipolar molecules, including a prediction of observable Efimov physics. Despite the anisotropic and long-range nature of the interaction, shielding enables a regime in which universality emerges already at the two-body level and extends to the three-body sector, where Efimov physics emerges. On the negative side of the scattering-length resonance, computed trimer binding energies display the characteristic scaling expected for Efimov resonances. Finally, the sudden approximation can be used to create trimer bound states, starting from positive energy trap states as a way to create or detect these molecular trimers. Moreover, the three-body parameter expressed in dipolar units is found to be universal.

16.
arXiv (CS.LG) 2026-06-16

Reinforcement Learning for LLM-based Event Forecasting

arXiv:2606.15917v1 Announce Type: new Abstract: We use Group Relative Policy Optimization (GRPO), a recently devised sample and memory efficient reinforcement learning method, to finetune pretrained LLMs in the range of 1.5B to 14B parameters equipped with the ability to get current information through the use of a Wikipedia revisions tool, or news summaries, to forecast real events beyond the knowledge cutoff of the LLM, as well as problems made to simulate different aspects of the dynamics of that training. We use the results of these experiments to comment on the scaling capability of LLMs for forecasting, as well as classify how judgmental forecasting fits into the verifiable/unverifiable domain taxonomy, considering the impact of the inherent aleatoric uncertainty when forecasting future events (e.g. the roll of a die). As a result of the GRPO training, we manage to bring a 1.5B parameter transformer (Qwen 2.5 1.5B) to forecasting performance superior to Claude Sonnet 3.5 over the same dataset as measured by cross entropy from the market agreed probabilities. We also discuss various dead ends on the path to this result.

17.
arXiv (CS.CV) 2026-06-16

An Empirical Analysis of Optimization Dynamics and Sparsity Boundaries in Large-Scale Pedestrian Attribute Recognition

Pedestrian Attribute Recognition (PAR) is critical for video surveillance, enabling forensic search and re-identification systems. Extreme class imbalance remains a fundamental obstacle when merging PETA and PA-100K into a 109,000-image composite corpus, where minority attributes have positive sample fractions below 1%. This causes standard BCE optimization to suppress rare traits, a phenomenon we term the majority negative class cheating trap. We present a systematic ablation of Multi-Label Focal Loss hyperparameters (alpha and gamma) on a ResNet-18 backbone. A calibrated configuration (alpha=0.50, gamma=2.0) achieves a Macro F1-score of 62.32%, matching BCE baseline while preserving superior hard-example mining and convergence dynamics. Our approach uses pure loss-function engineering with zero computational overhead for edge deployment. We identify the Sparsity Wall, a hard boundary where positive sample fractions below 0.1% make global loss reweighting ineffective, requiring instance-level intervention.

18.
Nature (Science) 2026-06-10

Measurement of reactor neutrino oscillation with the first JUNO data

Neutrino oscillations (see refs. 1,2 and references therein), a quantum effect manifesting at macroscopic scales, are governed by lepton flavour mixing angles and neutrino mass-squared differences3 that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavour framework, determining the mass ordering of neutrinos and probing possible new physics. The Jiangmen Underground Neutrino Observatory (JUNO)4 is a 20-ktonne liquid-scintillator detector located 52.5 km from multiple reactor cores, designed to resolve the interference pattern of reactor neutrinos with sub-percent precision5,6. Here we report, using the first 59.1 days of data collected since detector completion in August 2025, the first simultaneous high-precision determination of two neutrino oscillation parameters, $${\sin }^{2}{\theta }_{12}=0.3092\,\pm \,0.0087$$ and $$\Delta {m}_{21}^{2}=(7.50\,\pm \,0.12)\times 1{0}^{-5}\,{\mathrm{eV}}^{2}$$ for the normal mass ordering scenario, improving the precision by a factor of 1.6 relative to the combination of all previous measurements. These results advance the basic understanding of neutrinos, validate the design of the detector and indicate the readiness of JUNO for resolving the neutrino mass ordering with a larger dataset. The rapid achievement with a short exposure highlights the potential of JUNO to push the frontiers of precision neutrino physics and paves the way for its broad scientific programme. The first data of the Jiangmen Underground Neutrino Observatory deliver high-precision neutrino oscillation parameters, improving measurements and demonstrating readiness to determine neutrino mass ordering.

19.
arXiv (quant-ph) 2026-06-25

Quantum metrology of electric and magnetic dipole moments: ultimate limits and optimal regimes

arXiv:2606.25510v1 Announce Type: new Abstract: The characterization of electric and magnetic dipole moments (EDM and MDM) in quantum systems is central to fundamental physics and quantum sensing. While EDM searches provide powerful probes of CP violation within and beyond the Standard Model, precise MDM estimation is crucial for high-precision magnetometry and the development of quantum sensors. In this work, we address the ultimate precision limits for separate and simultaneous estimation of both dipole moments in a generic two-level system coupled to electromagnetic fields. We analyze three classes of quantum probes/strategies: unitary and depolarizing dynamics, and thermal equilibrium states. For each, we derive the quantum Fisher information (matrix), identify optimal probes, and determine the ideal operating conditions, such as evolution times and temperatures, that maximize estimation precision. We further assess the compatibility and sloppiness of the statistical models, showing that orthogonal dipole moments configurations enable joint estimation of EDM and MDM, whereas parallel configurations are intrinsically sloppy, permitting only the estimation of a single parameter combination. Our results provide a unified metrological framework for estimation schemes ranging from neutron EDM searches to molecular magnetometry, and highlight the distinct roles of coherence, noise, and thermalization in multiparameter quantum sensing of dipole moments.

20.
arXiv (math.PR) 2026-06-16

Malliavin Calculus for the stochastic Cahn-Hilliard equation driven by fractional noise

arXiv:2601.10490v2 Announce Type: replace Abstract: The stochastic partial differential equation analyzed in this work is the Cahn-Hilliard equation perturbed by an additive fractional white noise (fractional in time and white in space). We work in the case of one spatial dimension and apply Malliavin calculus to investigate the existence of a density for the stochastic solution $u$. In particular, we show that $u$ admits continuous paths almost surely and construct a localizing sequence through which we prove that its Malliavin derivative exists locally, and that its law is absolutely continuous with respect to the Lebesgue measure on $\bf R$, establishing thus that a density exists. A key contribution of this work is the analysis of the stochastic integral appearing in the mild formulation: we derive sharp estimates for the expectation of the $p$-th power ($p \geq 2$) of the $L^{\infty}(D)$-norm of this stochastic integral as well as for the integral involving the $L^{\infty}(D)$-norm of the operator associated with the kernel appearing in the integral representation of the fractional noise, all of which are essential for this study.

21.
arXiv (CS.CV) 2026-06-11

Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding

In this paper, we address the problem of zero-shot understanding of accidents from surveillance videos by identifying when an impact event occurs, what type of impact it is, and where in the frame it occurs using natural language. We propose a three-stage pipeline that decomposes the accident understanding into when, what, and where. The first stage extracts a short temporal window around the impact using vision-language similarity. In the second stage, we perform metadata-driven multi-prompt reasoning with five complementary views (baseline, motion, geometry, contrast, and tiebreaker) and resolve disagreement via an entropy-gated pairwise adjudicator. Finally, we localize the impact of an open-vocabulary detector queried on the predicted accident type and scene layout, and aggregate detections across keyframes using a score-weighted centroid. Our pipeline achieves a substantial improvement in the harmonic-mean score over a centre-of-frame baseline on the zero-shot ACCIDENT @ CVPR benchmark. We show that decomposing zero-shot video understanding into temporal localization, semantic classification, and spatial grounding enable more reliable reasoning with vision-language models than direct prompting alone.

22.
medRxiv (Medicine) 2026-06-15

Natural Language Processing Based Solution for Labeling Brain Metastasis Identified in Radiology Reports

Abstract Purpose: Brain metastases (BM) far exceed primary CNS tumours and constitute the majority workload for neuro-oncology care providers. Currently, the cancer registries only capture synchronous BMs, which is only a small proportion of all BMs. We aim to develop and validate a natural language processing (NLP) algorithm that identifies brain metastases in radiology reports, enabling scalable surveillance of asynchronous BMs. Methods: Using population-based cancer registry data in Alberta, Canada, we identified a cancer cohort diagnosed between 2012–2019 with follow-up to 2022. All brain/head radiology reports at and post-cancer diagnosis were identified. Reports were sampled through a multi-phase approach and manually labeled for BM presence. We trained two Bio_ClinicalBERT models on the "Findings" and "Impressions" sections, respectively, and took the maximum predicted probability as the report-level prediction. Internal and external validation used reports from the Canadian provinces of Alberta, Ontario, and British Columbia. Results: The models were trained on 1,879 samples. For internal validation, 1,833 reports from 357 patients were tested. At a probability threshold of 0.4, the model achieved a sensitivity of 0.888 and precision of 0.499. The ensemble substantially outperformed single-section models, which achieved sensitivities of only 67.8% (Findings) and 74.2% (Impressions). On external validation, sensitivity was 0.918 in Ontario and 0.726 in British Columbia, demonstrating robustness across diverse data distributions. Conclusions: An NLP-based pipeline processing both Findings and Impressions sections has been developed and validated in three Canadian provinces. It meets cancer registry operational requirements and to be implemented into the surveillance workflow in Alberta and British Columbia, providing a foundation for population-level BM surveillance.

23.
bioRxiv (Bioinfo) 2026-06-17

In silico characterization of lysis and host-recognition modules in Staphylococcus aureus bacteriophage genomes

Background/aim: Antimicrobial resistance in methicillin-resistant Staphylococcus aureus (MRSA) requires precision non-antibiotic therapeutics, yet phage lytic efficacy is poorly predicted by phenotypic assays, as shown by paradoxical biofilm responses. This study characterized the genomic architecture of lytic S. aureus bacteriophages, focusing on the conservation of the lysis module and the variability of host-recognition modules, to provide a rational basis for phage candidate selection. Materials and methods: Twenty-two complete S. aureus phage genomes were retrieved from NCBI GenBank. Genomic features were extracted with custom Biopython scripts. Lysis (endolysin, holin) and host-recognition (tail fiber/receptor-binding protein) modules were annotated and validated by InterPro domain analysis, with disrupted endolysins resolved by tBLASTn. Phylogeny was reconstructed from large terminase subunit (TerL) sequences using maximum likelihood. Results: Genome size spanned three classes, from 17.5 to 148.6 kb. The LysK-type endolysin (CHAP, Amidase, SH3b) was highly conserved, whereas tail fiber/RBP genes were detected in only 14 of 22 phages. Domain analysis reclassified two proteins annotated as endolysins as virion-associated peptidoglycan hydrolases, and identified two independent mechanisms, HNH endonuclease insertion and intron splitting, that interrupt lysis-module genes and confound automated annotation. Maximum likelihood analysis recovered a strongly supported, highly conserved core clade with EW and SA13 as divergent lineages. Conclusion: Lysis modules are conserved whereas host-recognition modules are variable, indicating that host recognition rather than the lytic enzyme is the principal determinant of host range and the more rational target for phage selection and engineering.

24.
arXiv (CS.AI) 2026-06-25

Exploring Information Seeking Agent Consolidation

arXiv:2602.00585v2 Announce Type: replace Abstract: Information-seeking agents have emerged as a powerful paradigm for knowledge-intensive tasks, yet today's systems remain specialized for the open web, documents, or local knowledge bases, hindering scalable and cross-domain deployment. We present the first systematic empirical study of consolidating these information-seeking agents into a single foundation agentic model. We compare two paradigms – data-level mixing, which trains a unified model on a mixture of datasets, and parameter-level merging, which merges independently trained experts in parameter space – across 3 training scenarios, evaluating 26 representative parameter-level methods on 10 benchmarks. To compare across heterogeneous benchmarks, we introduce a geometric Composite Score and an Imbalance Score that describe overall performance and task skew. Our analysis shows that (i) well-designed parameter-level merging attains parity with data mixing at a fraction of its training cost and is order-agnostic; (ii) parameter-level merging structurally preserves out-of-domain capabilities that data mixing universally forgets; and (iii) cross-scenario stability is strongly tied to consolidation quality. We distil our observations into a method-selection guide and design principles for next-generation merging operators.

25.
arXiv (CS.CV) 2026-06-11

CFCamo: A Counterfactual Detect-or-Abstain Framework for Camouflaged Object Detection

Vision-language reinforcement learning has recently shown strong target-present localization for camouflaged object detection (COD). Yet localization is only one side of the decision: when the agent faces an ordinary image with no camouflaged target, will it still claim that a camouflaged object exists? Standard COD training and evaluation data are positive-only, so agents optimized under this setting can acquire an over-detect bias, a task-specific form of object hallucination that standard COD evaluation leaves unmeasured. To quantify this target-absent behavior, we construct Counterfactual COD (CF-COD), a paired benchmark that removes the camouflaged target from each held-out COD evaluation image while preserving a plausible background. CF-COD evaluates whether a model detects the target on the original image and abstains on the target-absent counterfactual, summarized by Pair Accuracy (PA). We further introduce CFCamo, a paired counterfactual framework for COD with abstention. For training, CFCamo optimizes a Qwen3-VL-4B-Instruct agent with Counterfactual Sequence Policy Optimization (CSPO), which samples paired original-counterfactual rollouts and uses a Counterfactual Paired Reward (CPR) to couple original-image detection with counterfactual abstention. On CAMO-test, CFCamo improves S_alpha by +3.7 pp over the prior RL-based COD baseline; across CF-COD, it reaches 80.0-90.8% PA. Ablations show that removing counterfactual coupling reduces PA to 1.4-5.2% despite strong target-present COD scores, showing that target-present evaluation alone does not characterize detect-or-abstain behavior. Overall, these results indicate that CFCamo improves COD agents by coupling target-present detection with target-absent abstention, rather than merely strengthening target-present localization. Code and data are available at https://github.com/suhang2000/CFCamo.