Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-19

Enhancing Graph Neural Networks Using Proximity Graphs for Dust Source Emission Forecasting

arXiv:2606.19825v1 Announce Type: new Abstract: Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal dynamics of these phenomena. In this paper, we demonstrate that proximity graphs enable Graph Neural Networks (GNNs) to effectively model the intricate spatial and temporal relationships between data points. Specifically, we use proximity graphs–such as Delaunay triangulation, Gabriel graph, k-Nearest Neighbor graph, and Yao graph–as the input for GNNs (including GraphSAGE, Graph Convolutional Networks, and Graph Attention Networks) to perform message passing. Our approach highlights the effectiveness of integrating proximity graphs with GNNs for robust and accurate dust source forecasting. To emphasize the importance of proximity graph representations, we compare our method against GNNs using random graphs for message passing. The results show that GNNs with proximity graphs significantly outperform those with random graphs and are also far superior to Long Short-Term Memory (LSTM) model in dust source emission forecasting.

02.
arXiv (CS.AI) 2026-06-11

Using Explainability as a Training-Time Reliability Signal for Efficient ECG Classification

arXiv:2606.12252v1 Announce Type: cross Abstract: Training deep neural networks for clinical time-series analysis is computationally demanding, yet many healthcare settings lack the resources required for repeated model development and deployment. This challenge is particularly evident in electrocardiogram classification, where large datasets and long training schedules make efficiency practically important. Progressive Data Dropout reduces training cost by excluding samples from gradient updates once they are learned, but it relies on model confidence and may retain samples that are difficult due to noise or ambiguity rather than useful signal. In this work, we introduce ERTS, an explainability-based reliability training signal for efficient ECG classification. ERTS uses explanation quality during training to distinguish between informative and unreliable uncertainty. Building on progressive data selection, we compute Grad-CAM attention maps for candidate samples and derive a focus score that measures whether model predictions are supported by coherent and localised patterns. Samples with low focus are filtered out, while those with meaningful attention are prioritised for gradient updates. We evaluate ERTS across three ECG datasets and multiple backbone architectures, showing consistent improvements in macro-F1 alongside reduced effective training cost. These results suggest that explanation quality can serve as a practical signal for improving both efficiency and reliability in clinical time-series learning. Code will be released.

03.
arXiv (CS.CL) 2026-06-19

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

作者:

Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral diagnostics, however, summarize the Laplacian spectrum by a handful of eigenvalues or hand-picked scalars, leaving most of its structure unused. We propose Free-Energy Signatures (Fes), a spectral descriptor that treats each layer's attention Laplacian as a Hamiltonian and extracts its thermodynamic potentials partition function, free energy, spectral entropy, heat capacity together with the random-matrix-theory (RMT) spectral form factor. We prove three results: (i)~Lipschitz stability of Fes under attention perturbation; (ii)~an expressiveness result showing that Fes enriches finite spectral summaries and approximates moment-derived spectral functionals under explicit regularity and grid-resolution assumptions; and (iii)~a finite-sample PAC bound on the AUROC of a training-free detector built from Fes. Empirically, across six open-weight LLMs and six benchmarks, a lightweight probe on Fes descriptors achieves the strongest aggregate AUROC among attention-spectral baselines, improving over LapEig by $+6.5$ AUROC points and over GoR-4 by $+2.4$ points on average, while requiring no update to the underlying LLM. In the fully unsupervised setting, an RMT-deviation score achieves mean AUROC $0.71$, providing a label-free but weaker detector. A complementary RMT analysis shows that correct generations exhibit more Wigner-Dyson like spectral statistics, whereas hallucinations exhibit more Poisson-like statistics. The anonymized code and config are provided in the supplementary material.

04.
arXiv (CS.CL) 2026-06-12

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly vulnerable because they process both text and images, creating broader attack surfaces. However, available resources for jailbreak detection are scarce, particularly in finance. To address this gap, we present FENCE, a bilingual (Korean-English) multimodal dataset for training and evaluating jailbreak detectors in financial applications. FENCE emphasizes domain realism through finance-relevant queries paired with image-grounded threats. Experiments with commercial and open-source VLMs reveal consistent vulnerabilities, with GPT-4o showing measurable attack success rates and open-source models displaying greater exposure. A baseline detector trained on FENCE achieves 99 percent in-distribution accuracy and maintains strong performance on external benchmarks, underscoring the dataset's robustness for training reliable detection models. FENCE provides a focused resource for advancing multimodal jailbreak detection in finance and for supporting safer, more reliable AI systems in sensitive domains. Warning: This paper includes example data that may be offensive.

05.
bioRxiv (Bioinfo) 2026-06-16

Expanding gene regulatory networks from transcriptome data through graphical modeling with heterogeneous priors

Gene regulatory network inference is widely used to reconstruct large-scale networks and identify functional genes from transcriptome data. Meanwhile, in many biological fields, core regulatory genes have been extensively studied, leading to the establishment of small-scale gene regulatory networks, and novel genes connected to these networks remain to be identified. However, methods for expanding existing gene networks by identifying novel regulatory interactions, rather than reconstructing the entire network, are not well established. Here, we propose a method for gene network expansion that incorporates known regulatory relationships and evaluates each candidate gene individually to infer its regulatory connections to the existing network. Using simulated datasets from the DREAM4 benchmark and the PRECISE-1K experimental dataset, our method outperformed conventional methods by incorporating prior knowledge. In particular, it improved the ability to distinguish true regulatory interactions from indirect associations arising from strong correlations among genes in the existing network. The method also showed strong performance for interactions involving genes with high outdegree or centrality. Furthermore, it maintained stable performance as the size of the existing network increased and was robust to noise in prior information. These results demonstrate that our method provides an effective framework for expanding existing gene regulatory networks by leveraging prior knowledge.

06.
arXiv (CS.CV) 2026-06-17

GeoDisaster: Benchmarking Orchestrated Agents for Operational Disaster Geo-Intelligence

Remote-sensing vision-language models (RS-VLMs) have advanced Earth-observation analysis toward visual interpretation and instruction-following, yet fall short of operational geo-intelligence, which demands tool-grounded spatial reasoning and structured, evidence-backed decisions. We introduce GeoDisaster, an operational geospatial disaster reasoning benchmark with 2,921 verified instances across 43 question types and five task families: deforestation monitoring, multi-hazard analysis, building-damage assessment, flood-safe routing, and Sentinel-1 SAR flood monitoring. Instances integrate heterogeneous EO/GIS evidence-optical and SAR imagery, raster masks, vector geometries, road networks, and exposure layers-spanning hazard detection, damage assessment, exposure estimation, and diagnostic report generation. Ground-truth answers are grounded in executable geospatial workflows and deterministic consistency checks, removing the need for language-model annotation. We further propose an orchestrated multi-agent framework with 18 disaster-oriented tools, where role-specialized agents coordinate through explicit execution contracts, aligned via Role-Contract Expectation Alignment (RCEA): failure-aware supervised fine-tuning combined with contract-grounded reinforcement learning over dense step-level signals. Experiments show that GeoDisaster challenges existing RS-VLMs and agentic systems, while RCEA improves tool use, evidence grounding, state consistency, and decision generation.

07.
arXiv (CS.LG) 2026-06-12

Multi-Token Residual Prediction

arXiv:2605.18817v2 Announce Type: replace Abstract: Diffusion Language Models (DLMs) generate text by iteratively denoising masked token sequences, offering a tradeoff between parallelism and quality compared to autoregressive models. In current practice, the number of tokens decoded per step is controlled by a confidence threshold, and quality degrades monotonically as more tokens are denoised per step. We introduce Multi-token Residual Prediction (MRP), a lightweight module that enables dependency-aware multi-token denoising within a single backbone forward pass. MRP exploits a key property of the denoising process: the logit distributions at adjacent denoising steps are remarkably similar. Rather than running the backbone a second time to obtain the next-step logits, MRP predicts the residual between steps from the backbone's hidden states, effectively denoising more tokens per backbone forward at a fraction of the cost. We apply MRP across the two operating regimes of DLM decoding. In the high-quality-low-throughput static denoising regime, MRP serves as a drafter for speculative decoding: its proposals are verified against the backbone, yielding lossless acceleration of up to 1.4x in SGLang. In the low-quality-high-throughput dynamic denoising regime, MRP instead drives a remasking scheme that revokes over-eager reveals, recovering most of the accuracy lost to aggressive low-threshold decoding and improving accuracy by up to 22.6 points on code generation task HumanEval and 17.7 points on reasoning task GSM8K.

09.
arXiv (CS.LG) 2026-06-18

Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health

arXiv:2606.18506v1 Announce Type: new Abstract: Objective sleep assessment relies on polysomnography (PSG), yet clinical impact is often better reflected in patient-reported outcomes (PROs) such as sleepiness and fatigue. Existing summary indices, including the Apnea-Hypopnea Index (AHI), provide limited insight into the multidomain physiology underlying functional recovery. We propose an interpretable, causal-discovery–guided framework for deriving a hierarchical Sleep Recovery Score (SRS) from multimodal PSG. Using two large population cohorts (MESA: n=1540; MrOS: n=825), we apply directed acyclic graph (DAG) learning to identify candidate physiological drivers spanning respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation. Although derived from clinical PSG, these domains map naturally to sensing streams increasingly available in connected health technologies, including wearable ECG, oximetry, and sleep-stage estimation devices. To preserve mechanistic plausibility, we introduce a two-stage screening process that combines physiology-based constraints with constrained LLM-assisted auditing to identify and remove structural confounders and construct-overlapping variables. Across cohorts, these five domains emerge as recurrent physiological domains associated with recovery, and the resulting SRS shows up to 2.5$\times$ stronger alignment with perceived recovery than AHI. By linking multimodal sleep physiology to patient-centered outcomes through an interpretable, bias-aware, and domain structured framework, this work provides a practical foundation for recovery modeling across both clinical sleep studies and emerging smart and connected health settings.

10.
arXiv (quant-ph) 2026-06-11

Quantum optimal control of the Dicke manifold in dipolar Rydberg atom arrays

arXiv:2606.02283v2 Announce Type: replace Abstract: The ability to engineer and control quantum states of many-body systems is a central challenge in quantum information science. For a register of $N$ qubits, the full Hilbert space dimension grows exponentially as $2^N$, rendering generic state preparation and control infeasible without exploiting structure or symmetry. A particularly important and physically motivated restriction is to the fully symmetric subspace, spanned by the Dicke states, which are simultaneous eigenstates of collective spin $J=N/2$. Ensembles of Rydberg atoms interacting via electric dipoles in two-dimensional tweezer arrays form a promising platform for achieving such control. However, the finite range of dipole-dipole interactions poses a challenge to generating and controlling the Dicke manifold because the Hamiltonian incurs leakage from the computational subspace. To counteract this leakage, we perform quantum optimal control algorithms on a truncated Hilbert space according to our newly developed method of ``irrep distillation'' (IRD), which captures the process by which the symmetric subspace couples to leakage error-spaces, using only linear-scaling Hilbert dimension. We implement gradient ascent pulse engineering (GrAPE) on control schemes with little or no local addressing, to generate resourceful states like Greenberger-Horne-Zeilinger, Dicke, and extremal quantum states. We benchmark each scheme of IRD-GrAPE for its quantum speed limit (QSL), as well as exactly testing pulse fidelities on small system sizes and predicting fidelities using higher-order IRD on larger systems.

11.
arXiv (CS.AI) 2026-06-19

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

arXiv:2606.20554v1 Announce Type: cross Abstract: Generative recommendation is an emerging paradigm that has shown promise in industrial recommendation systems, aiming to predict users' next interactions from their historical behaviors. At the core of generative recommendation lies item tokenization, which bridges item semantics and recommendation models. However, existing methods often struggle to effectively organize and inject complex user-behavioral and item-semantic contexts into recommendation models simultaneously. On the one hand, existing graph-based integration methods, such as graph serialization and graph neural networks, either suffer from scalability issues or exploit only local graph information. On the other hand, existing semantic tokenization methods typically rely on heuristics and lack explicit supervision signals, which may lead to inaccurate or suboptimal semantic representations. To address these limitations in user interest context modeling, we propose G2Rec, a scalable framework that unifies holistic graph-based user co-engagement modeling with semantic tokenization for industrial-scale generative recommendation. Overall, G2Rec enables recommendation models to capture holistic and semantically grounded user interest prototypes without requiring ground-truth user interests, thereby providing more comprehensive and accurate modeling of user behavior contexts in industrial sequential recommendation. Online deployment across product surfaces and extensive experiments on public datasets demonstrate the superiority of G2Rec over existing methods.

12.
arXiv (quant-ph) 2026-06-17

Photon anti-bunching in high harmonic generation

arXiv:2606.17620v1 Announce Type: new Abstract: Photon anti-bunching is the direct evidence for the existence of photons without having a classical counterpart. Unlike bunching of photons, which can have a semi-classical description, the effect of photon anti-bunching can only be understood with quantized electromagnetic fields. However, for the process of high harmonic generation (HHG), where many photons of the driving field are upconverted to a single photon of higher energy, there is yet no clear evidence for the presence of individual photon emission. The key result of this work is the prediction of photon anti-bunching in the process of HHG, marking it the first theoretical discovery of non-classicality in the temporal correlations of HHG photons. While other non-classical signatures in HHG, such as sub-Poissonian statistics or squeezing, have been discussed for an ensemble of photons, the anti-bunching signature reported here is a signature of a single photon. This is achieved by using the recently developed Heisenberg picture approach for quantum optical HHG, revealing clear anti-bunching signatures in the intensity correlation function across the entire harmonic spectrum.

13.
arXiv (quant-ph) 2026-06-16

Quantifying Coherence-to-Entanglement Conversion Efficiency under Noisy Operations

arXiv:2606.16916v1 Announce Type: new Abstract: We investigate the noise-limited conversion of local quantum coherence into bipartite entanglement in a minimal two-qubit protocol comprising a coherent single-qubit input, an incoherent ancilla, an ideal CNOT operation, and subsequent environmental noise. Employing the $l_1$-norm of coherence and the entanglement negativity as resource quantifiers, we establish an exact closed-form correspondence between local single-qubit input coherence and the two-qubit entanglement generated in the noiseless limit, showing that the output negativity is precisely one half of the initial $l_1$-coherence. We then derive analytic expressions for the surviving entanglement and the associated coherence-to-entanglement conversion efficiency under two representative noise mechanisms: independent phase damping and global two-qubit depolarizing noise. The two channels exhibit qualitatively distinct degradation behavior. Phase damping induces a universal multiplicative suppression of the generated entanglement, yielding a coherence-independent conversion efficiency and no finite-noise entanglement sudden death. In contrast, global depolarization introduces an isotropic mixing contribution that shifts the partial-transpose spectrum, producing coherence-dependent degradation and a finite sudden-death threshold. We show that maximally coherent inputs not only maximize the entanglement generated by the CNOT protocol but also optimize its robustness against depolarizing noise. Direct density-matrix simulations validate the analytic results to numerical precision. These findings provide a compact analytic benchmark for assessing how different noise mechanisms constrain coherence-to-entanglement conversion in elementary quantum-information protocols and near-term quantum devices.

14.
arXiv (CS.LG) 2026-06-19

Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision

arXiv:2606.20291v1 Announce Type: new Abstract: Remote sensing is increasingly relied upon to deliver actionable science for forest and wildfire risk management across large landscapes. Wall-to-wall, annually updated maps are a persistent need for effective forest management. Many planning systems and data collections combine disparate data sources with different purposes, vintages, and prediction quality, which leads to confounding behavior in operational planning systems. We introduce the VibrantForests framework, developed and applied to map forest attributes and provide a coherent foundation for effective forest and wildfire planning. VibrantForests includes a satellite-based forest structure model trained on lidar-derived samples and applied across the contiguous United States to concurrently generate estimates of canopy cover, canopy height, aboveground live tree biomass, basal area, and quadratic mean diameter at 10-meter resolution. We demonstrate predictive capability spanning the full spectrum of forest conditions ranging from sparse-canopy/low-biomass to dense-canopy/high-biomass. Results show that our model extends the range at which saturation is commonly encountered in comparable passive-sensor models, and reduces regression-to-mean behavior that commonly produces overestimation of forest attributes in small/sparse conditions and underestimation in large/dense conditions. The VibrantForests framework addresses a key limitation in large-area forest and wildfire planning by delivering coherent wall-to-wall estimates of management-relevant attributes at annual cadence and 10m resolution.

15.
arXiv (math.PR) 2026-06-18

First to reach $n$ game

arXiv:2506.08782v4 Announce Type: replace Abstract: We consider a game with two players, consisting of a number of rounds, where the first player to win $n$ rounds becomes the overall winner. Who wins each individual round is governed by a certain urn having two types of balls (type 1 and type 2). At each round, we randomly pick a ball from the urn, and its type determines which of the two players wins. We study the game under three regimes. In the first and the third regimes, a ball is taken without replacement, whilst in the second regime, it is returned to the urn with one more ball of the same colour. We study the properties of the random variables equal to the properly defined overall net profits of the players, and the results are drastically different in all three regimes.

16.
arXiv (CS.AI) 2026-06-16

Relational Structural Causal Models

arXiv:2606.14892v1 Announce Type: new Abstract: An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification–including in the presence of unobserved confounding–we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians.

17.
arXiv (quant-ph) 2026-06-19

Quantum Batteries as Work Sources for Phase-Locked Parametric Amplification

arXiv:2606.20306v1 Announce Type: new Abstract: Quantum batteries have been proposed as locally precharged work sources for superconducting quantum technologies, suggesting a route to reduce continuously supplied microwave drives. Here we ask whether the pump tone of a quantum-limited parametric amplifier can be replaced, or strongly duty-cycled, by a finite bosonic quantum battery. Quantizing the pump of a nondegenerate parametric amplifier exposes a resource distinction hidden in the classical description: stored pump energy can generate signal-idler photons, but pump phase coherence is required to generate a phase-locked amplifier field. In a closed trilinear model, coherent and phase-randomized coherent pumps with the same photon-number distribution produce comparable pair numbers, yet only the coherent pump produces anomalous two-mode coherence and an EPR-squeezed interference dip. Including leakage, we collect the emitted fields into cascaded temporal modes. At matched collector bandwidth, the coherent pump gives \(I_{\min}^{(f)}=0.553\), whereas the phase-randomized pump gives \(I_{\min}^{(f)}=1.94\) at nearly identical collected energy. Weak amplitude squeezing slightly improves the dip by reducing finite-pump number fluctuations while preserving the coherent displacement. Thus battery-powered parametric amplification requires phase-coherent stored energy, possibly assisted by number-noise reduction, rather than stored energy alone.

18.
Nature (Science) 2026-06-15

Daily briefing: Iron-Age human bones were made into tools before interment

作者:

Newly uncovered bones hint at how Iron Age Britons treated their dead. Plus, AI models have failed to beat human mathematicians at research-level problems and the everyday items that make great scientific tools. Newly uncovered bones hint at how Iron Age Britons treated their dead. Plus, AI models have failed to beat human mathematicians at research-level problems and the everyday items that make great scientific tools.

19.
arXiv (quant-ph) 2026-06-11

Entanglement preservation and Clauser-Horne nonlocality in electromagnetically induced transparency quantum memories

arXiv:2507.15453v4 Announce Type: replace Abstract: Entanglement preservation in noisy quantum memories represents a central challenge in quantum information science. While experiments have shown that electromagnetically induced transparency (EIT) memories can store entangled photons, a quantitative theoretical analysis of whether nonlocal quantum correlations can survive storage loss induced by ground-state decoherence remains limited. Here we combine the dark-state polariton formalism with a reduced density-operator treatment to derive an EIT-specific effective pure-loss description for the retrieved photonic state in the ground-state-decoherence-limited regime. The analysis reveals that decoherence transforms an initially pure Bell state into a mixed state with a vacuum component and predicts a protocol-dependent storage-efficiency benchmark of 89.7% for violating the chosen unconditional Clauser-Horne (CH) inequality. Above this benchmark, the retrieved photonic state violates the CH inequality without post-selection, whereas below it, this unconditional CH violation is no longer obtained. This framework provides a quantitative theoretical description of entanglement retention, retrieved photonic density operators, and protocol-dependent Bell-test benchmarks in EIT quantum memories.

20.
arXiv (CS.LG) 2026-06-19

DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning

arXiv:2606.19656v1 Announce Type: cross Abstract: A natural recipe for intelligent robotic decision-making is initializing from pretrained generative control policies, which have summarized offline experience, and adapting them to self-collected online experience. We present DF-ExpEnse, an exploration technique that improves the quality of online experience collection, thus increasing finetuning sample-efficiency. DF-ExpEnse leverages the multimodal modeling capabilities of the generative control policy to create an expressive and tractably evaluatable candidate set. It then utilizes an ensemble of critics to identify the action that best balances quality with high exploration interest. In fleet settings, DF-ExpEnse further enables cross-agent communication to facilitate collaborative exploration as a group. DF-ExpEnse can be seamlessly integrated with existing strategies that finetune pretrained generative control policies via reinforcement learning. We experimentally validate consistent sample-efficiency benefits through DF-ExpEnse across a variety of manipulation and locomotion tasks, compared to default finetuning and alternative action selection schemes. Project can be found at https://df-expense.github.io.

21.
arXiv (CS.AI) 2026-06-16

AQ4SViT: An Automated Quantization Framework with Search Gating Policy for Compressing Spiking Vision Transformers

arXiv:2606.15523v1 Announce Type: cross Abstract: Spiking Vision Transformers (SViTs) have emerged as alternative low-power ViT models, but their large sizes hinder their deployments on resource-constrained embedded AI systems. To address this, state-of-the-art works proposed quantization techniques to compress SViT models, but their manual, human-guided approach needs a huge design time and power/energy consumption to find the appropriate quantization setting for each given network, making this approach not scalable for quantizing multiple networks. Toward this, we propose AQ4SViT, a novel automated quantization framework for SViTs that can provide quick quantization settings with good trade-offs between accuracy and memory. To achieve this, AQ4SViT employs the following key ideas: quantization search strategy that evaluates the quantization setting candidates while considering the accuracy constraint; and search gating policy that quickly evaluates and selects promising quantization candidates by leveraging membrane potential drift as a performance proxy. In the search gating policy, AQSViT employs two search algorithm variants to provide trade-off options: Greedy search, which performs fast but may lead to local optima; and Beam search, which performs slower but has better performance in finding global optima selection due to a wider search space. Experimental results show that AQ4SViT-Greedy quickly finds the appropriate quantization settings, achieving up to 6.6x faster search time and up to 82.5% memory saving compared to the state-of-the-art; while AQ4SViT-Beam further reduces the memory footprint by up to 90% compared to the state-of-the-art, but with 4.5x longer search time; all these results are obtained while maintaining high accuracy within 1.5% from the original/non-quantized models on the ImageNet dataset. These results highlight that AQ4SViT framework offers advancements toward SViT deployments on embedded AI systems.

22.
arXiv (CS.AI) 2026-06-11

Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks

arXiv:2605.23243v2 Announce Type: replace-cross Abstract: We evaluate whether frontier LLMs are ready for cybersecurity through a dual-mode benchmark: white-box function-level vulnerability detection (VulnLLM-R, across C/Java/Python) and black-box web application security testing (five production-style applications with 118 ground-truth vulnerabilities across 20+ CWE families, which we will open-source). We test six frontier models (GPT-5.4, Codex~5.3, Claude Opus~4.6, Sonnet~4.6, Gemini~3.1~Pro and Gemini~3~Flash) and two domain-specialized models across four testing paradigms. Our findings are sobering: (1)~every frontier model produces 10-50% false positive rates in white-box detection, systematically over-predicting vulnerabilities; (2)~in black-box testing, frontier models achieve only 4-8% ground-truth coverage, improving to just 10-19% even with external security tools (Playwright MCP, Burp Suite MCP); (3)~structured penetration-testing methodology encoded in domain-specialized agents raises per-family detection above 50%, demonstrating that methodology, not scale, is the primary lever; and (4)~a domain-specialized defense model achieves the highest precision (0.904) and lowest false positive rate (9.7%) among all models, on a single GPU. We identify the absence of structured security testing traces end-to-end request/response sequences, failure-heavy data, and multi-step attack chains as the fundamental training data bottleneck, and propose self-play security testing as a data generation strategy. Our results make the case for vertical foundation models purpose-built for cybersecurity.

23.
arXiv (CS.CV) 2026-06-11

Spatially Selective Self-Training for Unsupervised Building Change Detection

Unsupervised building change detection aims to learn building-change masks from unlabeled bi-temporal remote sensing images. Existing label-free methods often follow a discrepancy-to-mask paradigm, directly using temporal differences, frozen foundation-model responses, prompt-based outputs, or post-processing results as final change maps. Although these strategies provide annotation-free cues, they do not learn a task-specific building-change detector and remain vulnerable to the gap between generic temporal discrepancies and building-defined structural changes. In practice, such discrepancies are often noisy and task-irrelevant, as appearance shifts, registration errors, and non-building modifications can produce strong but misleading responses. To address this problem, we propose SST-CD, a spatially selective self-training framework that reformulates fully label-free building change detection as end-to-end detector learning under noisy pseudo supervision. SST-CD uses temporal discrepancies as candidate pseudo labels and trains the detector only on spatially reliable pixels, whose reliability is estimated by a local consistency criterion that filters inconsistent regions from supervision. To further stabilize noisy self-training, a lightweight feature adapter recalibrates bi-temporal features, while a prototype-based decoder produces compact change and no-change representations. Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show that SST-CD achieves F1 scores of 83.08%, 91.69%, and 86.60%, respectively, outperforming existing unsupervised and label-free baselines.

24.
PLOS Computational Biology 2026-06-15

A multilevel hierarchical framework for quantification of experimental heterogeneity in population snapshot data

by David J. Warne, Xiangrun Zhu, Thomas P. Steele, Stuart T. Johnston, Scott A. Sisson, Matthew Faria, Ryan J. Murphy, Alexander P. Browning Biological systems exhibit substantial heterogeneity: that is, variation in specific characteristics of individuals within a population. As a result, it is of critical importance to appropriately account for biological heterogeneity when calibrating mathematical models to infer cellular processes and predict behaviour. Recent approaches consider ordinary differential equations with random parameters to quantify heterogeneity in dynamical processes of cells. In this setting, statistical inference is performed to characterise the distribution of these random parameters within a cell population. One significant limitation of this approach is the tacit assumption that there are no substantial deviations in these distributions across experimental replicates. In this work, we propose a flexible Bayesian hierarchical differential equation modelling framework that quantifies and distinguishes both inter-experimental heterogeneity (heterogeneity between experimental replicates) and intra-experimental heterogeneity (biological heterogeneity within replicate populations). We consider two recent studies that employ mathematical models to interpret flow cytometry snap-shot data and quantify heterogeneity in nano-particle cell interactions and cell internalisation processes. Using simulation data, we demonstrate that substantial inaccuracy in the inferred dynamics can arise when experimental heterogeneity is not accounted for. By contrast, our hierarchical approach is robust to variability in inter-experimental and intra-experimental heterogeneity and our method simplifies to previous methods when inter-experimental heterogeneity is negligible. Our approach is flexible and widely applicable to applications involving replicate populations and snapshot data. We provide open-source implementations of our methods on GitHub.

25.
arXiv (CS.LG) 2026-06-17

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

arXiv:2606.18106v1 Announce Type: new Abstract: This paper explores the problem of finding the minimum zero-forcing set on undirected graphs and proposes an adapted machine-learning framework to solve the problem. The minimum zero-forcing set problem is a graph coloring problem where the color of an initial set of nodes propagates throughout a network. The set of nodes is zero-forcing if it forces all uncolored nodes to change color under the constraint of the color-change rule. There are several applications to this problem across different domains such as network science, network control, and designing logical circuits. Finding the minimum zero-forcing set is shown to be NP-hard. We propose a reinforcement learning framework, SD-ZFS, that adapts the S2V-DQN architecture to the ZFS problem. We train several models on this adapted framework and analyze the performance across graph datasets that have varying structures. We evaluate how the models trained on the framework generalize, scale, and transfer to different network types. The results demonstrate the effectiveness of the framework when compared against the optimal solution and greedy heuristic. We provide further insight into how the ZFS problem can be solved through machine-learning and the influence of network structure on the problem.