Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-25

Bridging Spherical Black-Box Optimizers

arXiv:2606.25761v1 Announce Type: new Abstract: When gradient information is unavailable, black-box optimization (BBO) methods provide a practical alternative. While Evolution Strategies (ES), Consensus-Based Optimization (CBO), Optimization via Integration (OVI), and related methods have each been studied independently, their connections remain underexplored. We unify these approaches within a common theoretical framework, revealing that they differ primarily in two design choices: fitness aggregation (controlling sharpness preference) and consensus scope (controlling modality). Leveraging these insights, we introduce hybrid optimizers that interpolate between existing methods. Our ES-OVI hybrid allows explicit control over the preference for flat minima, enabling a trade-off between performance and robustness in continuous control tasks. Our CBO-OVI hybrids combine the higher-dimensional efficiency of parametric methods with the multimodal capabilities of particle-based approaches, achieving competitive results on language model merging under limited evaluation budgets. We validate our methods on standard BBO benchmarks and higher-dimensional locomotion tasks, demonstrating that the hybrid methods can outperform their constituent algorithms.

02.
arXiv (CS.CV) 2026-06-18

Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

Recent progress has shown promise in distilling multi-step video diffusion models into efficient few-step students. Among them, Distribution Matching Distillation (DMD) and its successor DMD2 achieved strong generation quality and fast convergence. However, due to the nature of the reverse Kullback–Leibler (KL) objective, these methods exhibit two persistent failure modes: a substantial drop in sample diversity, and visibly over-saturated outputs that deviate from real-video appearance. In this work, we propose Data-Forcing Distillation (DFD), a simple post-training framework that restores diversity and fidelity in DMD with only a single-line of code change. At its core is the teacher score discrepancy to guide the student toward the real-data distribution, pulling it to missing modes (mitigating mode collapse) and away from problematic modes absent in real data (avoiding over-saturation). We provide an in-depth theoretical analysis of our framework and validate our approach on text-to-video, image-to-video, and autoregressive video generation. With only 100–300 steps of finetuning, DFD effectively restores diversity and fidelity on both Wan2.1-1.3B and Cosmos-Predict2.5-2B model, resolving the over-saturation artifacts with significantly better video dynamics and appearance, and even outperforms the teacher model.

03.
arXiv (CS.CV) 2026-06-18

Quantification of Uncertainty with Adversarial Models in Medical Image Segmentation

Reliable pixel-level uncertainty quantification holds the potential to transform clinical workflows by enabling high-fidelity longitudinal monitoring and distinguishing true pathological changes from artifacts. Ideally, these models provide the stability required for critical treatment planning and surgical intervention. However, standard deep learning models often suffer from miscalibration, yielding overconfident predictions that mask underlying vulnerabilities at subtle pathological boundaries. To address this, we propose QUAM-SM, a post-hoc framework using targeted adversarial search to identify "adversarially fragile" pixels. By actively seeking perturbations that expose predictive instability, our method highlights regions where decisions are most vulnerable to being flipped. Importantly, the framework disentangles epistemic uncertainty from aleatoric uncertainty. Experiments on two public datasets with multiple expert annotations demonstrate that QUAM-SM outperforms both standard and recent uncertainty estimation approaches in terms of reliability and boundary sensitivity. Code is available at https://github.com/HanaJebril/quam_sm

04.
arXiv (math.PR) 2026-06-12

Symmetric Cooperative Motion in Higher Dimensions

arXiv:2606.13459v1 Announce Type: new Abstract: We prove a distributional convergence result for a multidimensional version of symmetric cooperative motion which was introduced and studied in one dimension in [HRW, SCM1]. Our approach relies on framing the associated recursive distributional equation as a discretization of the porous medium equation. A major challenge is to analyze the behaviour of finite difference schemes which approximate weak solutions of the porous medium equation with unbounded initial data. In overcoming this difficulty, we perform a detailed analysis of the probability mass function of symmetric cooperative motion, in which we introduce several new comparison arguments for the discrete process. Consequently, along the way, we establish a novel multidimensional convergence result for a finite difference scheme approximating the ZKB/Barenblatt solution of the porous medium equation, which is of independent interest.

05.
Nature (Science) 2026-06-17

Cucurbituril-based anion-conducting membranes with supramolecular nanopores

作者:

Nanoporous anion-conducting membranes have gained considerable interest for their potential to reduce resistance in electrochemical devices1–4. Current pore-forming methods, such as backbone engineering through polymers of intrinsic microporosity5,6 or covalent organic and metal–organic frameworks7,8, however, suffer from limited structural control, mechanical fragility or demanding synthesis. Here we establish a supramolecular strategy that overcomes these limitations by constructing uniform, dynamic nanopores. Co-assembly of the rigid macrocyclic host cucurbit[7]uril with the cationic polymer guest quaternized poly(piperidinium-terphenyl) yields a robust network of nanometre-scale channels while simultaneously enhancing mechanical and chemical stability. The dynamic host–guest interactions allow the pore structure to fluctuate on picosecond and angstrom scales. This transient environment supports low-friction hydroxide migration through a Grotthuss mechanism, producing a marked enhancement in ionic conductivity. This bottom-up design principle provides a versatile new tool for molecularly engineering transport pathways and promises to advance electrochemical reactors with respect to energy efficiency, operational stability and the production of high-purity products. A supramolecular strategy, in which uniform, dynamic nanopores are constructed, overcomes the limitations of limited structural control, mechanical fragility or demanding synthesis in nanoporous anion-conducting membranes, providing a versatile tool for molecularly engineering transport pathways.

06.
arXiv (CS.CV) 2026-06-24

Segmentation and Classification of Pap Smear Images for Cervical Cancer Detection Using Deep Learning

Cervical cancer remains a significant global health concern and a leading cause of cancer-related deaths among women. Early detection through Pap smear tests is essential to reduce mortality rates; however, the manual examination is time consuming and prone to human error. This study proposes a deep learning framework that integrates U-Net for segmentation and a classification model to enhance diagnostic performance. The Herlev Pap Smear Dataset, a publicly available cervical cell dataset, was utilized for training and evaluation. The impact of segmentation on classification performance was evaluated by comparing the model trained on segmented images and another trained on non-segmented images. Experimental results showed that the use of segmented images marginally improved the model performance on precision (about 0.41 percent higher) and F1-score (about 1.30 percent higher), which suggests a slightly more balanced classification performance. While segmentation helps in feature extraction, the results showed that its impact on classification performance appears to be limited. The proposed framework offers a supplemental tool for clinical applications, which may aid pathologists in early diagnosis.

07.
arXiv (CS.AI) 2026-06-12

Hellinger Multimodal Variational Autoencoders

arXiv:2601.06572v4 Announce Type: replace-cross Abstract: Multimodal variational autoencoders (VAEs) are widely used for weakly supervised generative learning with multiple modalities. Predominant methods aggregate unimodal inference distributions using either a product of experts (PoE), a mixture of experts (MoE), or their combinations to approximate the joint posterior. In this work, we revisit multimodal inference through the lens of probabilistic opinion pooling, an optimization-based approach. We start from Hölder pooling with $\alpha=0.5$, which corresponds to the unique symmetric member of the $\alpha-divergence$ family, and derive a moment-matching approximation, termed Hellinger. We then leverage such an approximation to propose HELVAE, a multimodal VAE that avoids sub-sampling, yielding an efficient yet effective model that: (i) learns more expressive latent representations as additional modalities are observed; and (ii) empirically achieves better trade-offs between generative coherence and quality, outperforming state-of-the-art multimodal VAE models.

08.
arXiv (CS.CL) 2026-06-12

ProPlay: Procedural World Models for Self-Evolving LLM Agents

Self-evolving agents are expected to improve through interaction without external supervision, but this remains difficult in partially observable environments where agents must explore actively, learn from limited feedback, and decide when to trust prior experience. Existing LLM-agent methods often rely on memory or planning modules, yet they rarely close the loop between them to continually refine an internal understanding of environment dynamics. We introduce ProPlay, a procedural world model that supports procedure-level preplay, where agents can rehearse future procedural paths using the learned world knowledge. Rather than representing experience as isolated rules or low-level action constraints, ProPlay abstracts successful trajectories into procedures and organizes them in a procedure graph that captures causal transitions among task stages. Each transition is associated with a reliability record embedding to estimate its task-specific contribution from past outcomes. Before each episode, ProPlay simulates future procedural trajectories over known graph structures as structured soft guidance; after execution, it refines the graph using environment feedback. Experiments on public benchmarks show that ProPlay consistently improves environment understanding and self-evolution capability over strong baselines. Our code has been released in https://github.com/antman9914/proplay.

09.
arXiv (math.PR) 2026-06-19

The t-Split Two-Periodic Aztec Diamond Model

arXiv:2606.19507v1 Announce Type: new Abstract: In this work we consider an Aztec diamond model split into two unequal regions which are asymptotically fixed in size. Each region is weighted with a distinct two-periodic weighting. We refer to this model as the t-split two-periodic Aztec diamond, to signify its difference from the previous work title Split Two-Periodic Aztec Diamond, where the model was split into two equal regions. We derive an integral expression for the correlation kernel of the model and give a partial description of the scaling limit behavior, along with a conjecture for the remainder. We refer to the larger and smaller sides of the model as the dominant and non-dominant sides, and to the location of the weight change as the interface. The dominant side exhibits a limit shape that depends only on its own weighting and is identical to that of the two-periodic Aztec diamond, while the non-dominant side appears to have a novel limit shape that depends on both weightings and the location of the interface. Lastly, we consider the complete limit shape in the case where the dominant side two-periodic parameter goes to 0.

10.
arXiv (CS.LG) 2026-06-19

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

arXiv:2606.19413v1 Announce Type: new Abstract: Multimodal time series forecasting, which pairs numerical sequences with domain-relevant textual reports, promises to inject world knowledge into forecasting pipelines. However, we uncover a critical failure mode in existing frameworks that we term text collapse: the text branch converges to a content-independent transformation, contributing negligible discriminative signal regardless of the input description. We argue that text collapse is a consequence of a fundamental asymmetry in time series forecasting: the numerical input is strongly autocorrelated with the output, making the numerical backbone inherently dominant, while the text branch, despite carrying complementary and often critical information, is insufficiently utilized, leading to its systematic underexploitation. To address this, we propose REST-TS (Residual-Exclusive Supervision for Text in Time Series), which turns the asymmetry into a design principle: the numerical backbone produces its own independent numerical forecast, and the text branch is exclusively supervised to predict the structured components of the residual, the prediction gap that numbers cannot explain. Because no numerical pathway can reduce these losses, the text branch must extract genuine content from the input description. Evaluated across diverse real-world domains and backbone architectures, REST-TS achieves state-of-the-art performance and consistently demonstrates greater text-branch utilization than existing frameworks, providing strong empirical evidence that supervising the text branch on the residual compels it to extract genuine content from the input.

12.
arXiv (CS.AI) 2026-06-24

Integrated Sensing and Communications for Real-time Avatar Control in XR over 5G

arXiv:2606.23771v1 Announce Type: cross Abstract: Extended Reality (XR) presents a challenging use case for 5G and 6G networks, requiring high data-rates and lowlatency communication to deliver a truly immersive experience. Moreover, in order to seamlessly translate physical actions to the virtual world, accurate gesture recognition and pose estimation are required. Current XR interaction solutions based on handheld controllers and cameras cannot easily capture full-body poses, inhibit the free use of hands, and require good visibility and a clear line of sight. In this work, we propose a multimodal sensing architecture for XR that combines 5G MillimeterWave (mmWave) Integrated sensing and communication (ISAC) and surface electromyography (sEMG) signals. 5G mmWave ISAC cannot only be used to deliver content wirelessly to the Head-mounted display (HMD), but also the same communication signals can be used to derive coarse body-level gestures and poses of the user, to support real-time avatar control. For fine-grained finger-level gestures, our architecture leverages lightweight sEMG sensors that capture forearm muscle activity. To illustrate the need of both modalities, we present evaluations of both sensing technologies. At the body level (5G), our architecture relies on power-per-beam-pair (PPBP), which can be computed from standard beam management or beam sweeping procedures of the 5G NR standard. PPBP-based sensing achieves 82.2$\pm$5.9% average accuracy when evaluated on users not seen during training. For fine-grained finger-level interactions, we show that surface electromyography (sEMG) carries strong discriminative information achieving consistent promising performance across different movement settings. Thus, combining the two modalities enables multi-scale gesture recognition, at the body level via existing 5G signals and finger level via lightweight sEMG sensors, forming a complete XR framework.

13.
arXiv (CS.AI) 2026-06-11

SAGE: Scalable AI Governance & Evaluation

arXiv:2602.07840v4 Announce Type: replace-cross Abstract: Evaluating relevance in large-scale search systems is fundamentally constrained by the governance gap between nuanced, resource-constrained human oversight and the high-throughput requirements of production systems. While traditional approaches rely on engagement proxies or sparse manual review, these methods often fail to capture the full scope of high-impact relevance failures. We present SAGE (Scalable AI Governance \& Evaluation), a framework that operationalizes high-quality human product judgment as a scalable evaluation signal. At the core of SAGE is a bidirectional calibration loop where natural-language Policy, curated Precedent, and an LLM Surrogate Judge co-evolve. SAGE systematically resolves semantic ambiguities and misalignments, transforming subjective relevance judgment into an executable, multi-dimensional rubric with near human-level agreement. To bridge the gap between frontier model reasoning and industrial-scale inference, we apply teacher-student distillation to transfer high-fidelity judgments into compact student surrogates at 92$\times$ lower cost. Deployed within LinkedIn Search ecosystems, SAGE guided model iteration through simulation-driven development, distilling policy-aligned models for online serving and enabling rapid offline evaluation. In production, it powered policy oversight that measured ramped model variants and detected regressions invisible to engagement metrics. Collectively, these drove a 0.25\% lift in LinkedIn daily active users.

14.
arXiv (CS.AI) 2026-06-12

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

arXiv:2606.12838v1 Announce Type: cross Abstract: Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

15.
arXiv (quant-ph) 2026-06-11

Fisher geometry reshapes the effect of incompatibility in multiparameter quantum estimation

arXiv:2606.11343v1 Announce Type: new Abstract: Multiparameter quantum estimation faces two fundamental obstacles: sloppiness, i.e., anisotropy of the quantum Fisher information matrix (QFIM) that renders some parameter directions insensitive, and incompatibility, the non-commutativity of optimal measurements for different parameters. The trade-off bound $C_T$ captures their joint impact on precision, but it has remained unclear how the distribution of incompatibility across parameter planes affects its overall cost. Here we separate the total amount of incompatibility from its location. We introduce a dimensionless quantity $G_n^{(F)}$ that measures the alignment between the incompatibility distribution and the eigenvalues of the QFIM, and show how the Frobenius scale of the incompatibility contribution factorizes. We obtain a bound and prove the incompatibility cost lies between this bound and a rank-dependent multiple thereof. We also prove that at fixed sloppiness, or equivalently fixed Fisher volume, concentrating incompatibility into a single parameter plane reduces the optimized trade-off cost because the Fisher geometry can then be reshaped to allocate more Fisher area to that plane. A qutrit $SU(2)$ encoding numerically confirms that states with larger incompatibility strength can nevertheless incur a smaller cost if the matching factor $G$ is sufficiently small. Our results establish that the distribution of incompatibility relative to the Fisher eigenbasis is a central diagnostic for multiparameter estimation, beyond the total incompatibility strength.

16.
arXiv (quant-ph) 2026-06-25

Detection of patterns in a discrete-outcome sensor network

arXiv:2606.25100v1 Announce Type: new Abstract: A discrete outcome quantum sensor network is one in which we are only interested in which detectors are activated. This can be studied in either the strong or weak interaction regime. If the detectors interact strongly with the environment, it is possible to definitely find which ones were activated. If the interaction is weaker, there is a possibility of making an error, and the object is to minimize the probability of this happening. Here we will be interested in this weaker interaction regime. We will also assume that only certain patterns of detectors will be activated, different patterns being translated versions of a fundamental one. Our object will be to find which pattern has been activated. We will look at both one and two-dimensional detector arrays and make use of techniques from minimum-error state discrimination.

17.
arXiv (CS.CV) 2026-06-17

LADBench: A Benchmark for Logical Fault Detection in Images

Large Vision Language Models (VLMs) excel at visual question answering and semantic grounding, but their capacity for autonomous logical reasoning remains underexplored. Existing anomaly benchmarks emphasize visual errors or direct prompting rather than the physical and social common sense needed for open-world deployment. To address this, we introduce LAD-bench, a benchmark of more than 1,000 curated synthetic images with logical anomalies across four domains: Residential, Urban, Collaborative, and Nature. We further propose a Tiered Prompting Protocol based on progressive disclosure, which measures how much explicit assistance a model needs to localize and reason about a logical fault. Evaluating leading foundation models reveals substantial weaknesses: even the best achieves only 70.11% overall accuracy, showing that implicit logical fault detection remains unsolved. Crucially, models often fail to identify anomalies even after receiving explicit hints in deeper tiers. By surfacing these limitations in sequential multimodal reasoning, LAD-Bench offers a rigorous framework for advancing the safety, reliability, and cognitive alignment of autonomous visual systems. Dataset and Code: https://huggingface.co/datasets/SahasraK/LADBench

18.
arXiv (CS.CL) 2026-06-16

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training. It also yields solver improvements averaging +4.4 points at 8B, with the largest gains on competition-level benchmarks. Our findings suggest that explicit action-space constraints, analogous to the structural role that game rules play in classical self-play, can help sustain productive co-evolution in language. Vocabulary dropout is one simple instantiation of this principle.

19.
arXiv (CS.LG) 2026-06-25

TRACER: Training-Free Closed-Loop Structured Inference for Traffic Accident Reconstruction

arXiv:2606.25002v1 Announce Type: new Abstract: Traffic accident reconstruction is a forensic inverse problem that requires recovering physically consistent motion from sparse and heterogeneous evidence. Existing learning-based approaches predominantly optimize for semantic plausibility or visual realism, rather than quantitative agreement with measurable geometry and dynamics. Here, we present TRACER, a training-free framework that formulates reconstruction as a closed-loop structured inference process. Instead of directly generating dense trajectories, our framework constructs and iteratively refines event-anchored motion hypotheses under geometric, kinematic, and interaction constraints, guided by structured case memory and consistency-driven diagnosis. This design enables incremental, interpretable corrections when evidence is insufficient, making the accident reconstruction process more aligned with the workflow of human experts. Experiments on real-world accident data show that TRACER achieves improved geometric fidelity, velocity consistency, and collision accuracy over both data-driven and physics-based baselines.

20.
arXiv (CS.AI) 2026-06-25

Position Spaces and Graphs

arXiv:2606.25719v1 Announce Type: new Abstract: In this paper, we introduce position graphs, a graph-based reasoning framework based on the formalization of position spaces. This framework utilizes two strict partial orders, representing horizontal and vertical alignment and precedence, to model the relative positions of discrete tokens. Unlike general qualitative spatial calculi, position graphs are constrained by a chain condition and compatibility requirements that focus on rows and columns. We provide a comprehensive theoretical analysis of this representation, beginning with a characterization of graph consistency. Conditions to ensure the consistency of position graphs are established. Furthermore, we investigate the computational complexity of structural pattern discovery, modeled as the induced subgraph isomorphism problem. We demonstrate that this problem remains NP-complete even within the restricted class of position graphs. While initially motivated by document processing, this work focuses on the underlying mathematical properties and algebraic consistency of position-based constraints, providing a formal logical layer that is independent of specific data extraction techniques.

21.
arXiv (CS.CV) 2026-06-12

Dual-Domain Equivariant Generative Adversarial Network for Multimodal CT-PET Synthesis

We present a Dual-Domain Equivariant Generative Adversarial Network (DDE-GAN) for multimodal CT-PET image synthesis. Traditional GAN-based approaches often operate solely in the spatial domain and ignore geometric consistency, resulting in limited structural fidelity. DDE-GAN addresses these challenges by jointly learning from both spatial and frequency (Fourier) domains, capturing complementary anatomical and spectral information. Furthermore, rotational equivariance embedded in the physics of the CT and PET measurements are integrated into the loss of both the generator and discriminator to ensure consistent responses under rotations, improving anatomical accuracy. A hierarchical dual-domain training strategy enforces intra- and inter-domain consistency through multi-stage loss functions. Evaluated on the HECKTOR 2022 CT-PET dataset, DDE-GAN achieves superior synthesis quality over baseline models for CT-PET image synthesis. The results demonstrate that combining dual-domain learning with geometric equivariance substantially enhances multimodal image synthesis accuracy and robustness, enabling practical applications in PET completion and data augmentation.

22.
bioRxiv (Bioinfo) 2026-06-21

ReSeT: a taxonomy-aware reference genome selection tool

Motivation: Reference genome composition determines which taxa a profiling pipeline can detect and distinguish, and becomes of critical importance for high-resolution profiling where taxonomic boundaries begin to blur. Existing selection tools optimize within-taxon representativeness but disregard discrimination across taxa, leaving open whether explicitly accounting for inter-taxon discrimination during selection improves profiling. Results: Here we present ReSeT, a facility-location-based reference genome selection tool that operates on arbitrary pairwise distance matrices, extended with a tunable inter-taxon discrimination term and per-genome selection cost, and solved by local search. We benchmark ReSeT against established selection methods on three viral datasets spanning varying degrees of taxonomic ambiguity. On the high-ambiguity SARS-CoV-2 datasets, appropriately tuned ReSeT selections matched or exceeded the strongest alternatives in terms of profiling accuracy, whereas on the low ambiguity IAV dataset VSEARCH remained dominant. Interestingly, we find that the novel inter-taxon discrimination term contributed weakly, indicating that ReSeT's facility-location formulation and selection cost drives ReSeT's performance. We further propose a novel taxonomic ambiguity index, computable from ReSeT's inputs, that summarizes the taxonomic ambiguity of reference genomes and aligns with where ReSeT improves over existing selection methods. Availability and implementation: ReSeT is implemented in Python ([≥]3.10) and is freely available under the MIT license. The source code is available on GitHub at https://github.com/JaspervB-tud/ReSeT and ReSeT can also be installed directly from the Python Package Index (PyPI) via pip install reset-bio.

23.
arXiv (CS.AI) 2026-06-24

Governed Shared Memory for Multi-Agent LLM Systems

arXiv:2606.24535v1 Announce Type: new Abstract: Multi-agent LLM environments require robust mechanisms for shared knowledge management. This paper formalizes the fleet-memory problem and identifies four foundational failure modes: unauthorized leakage, stale propagation, contradiction persistence, and provenance collapse. To address these, we define explicit systems-level primitives: scoped retrieval, temporal supersession, provenance tracking, and policy-governed memory propagation. These primitives are implemented in MemClaw, a production multi-tenant memory service, and evaluated via ArgusFleet, a reproducible harness testing four governance dimensions. Rather than a baseline comparison, this study measures a live production service, emphasizing real-world architectural insights and negative results. Key Evaluation Results Provenance: Successfully reconstructed 100% of depth-four derivation chains with correct writer identity at sub-second per-hop latency. Propagation: Demonstrated high intra-fleet visibility with zero cross-fleet leakage. Under strong write mode, write-to-visible latency was optimized to a single search round-trip. Production Architectural Issues Discovered Asymmetric Scope Enforcement: Tenant isolation held, but sub-tenant scope was initially bypassed on direct GET-by-id requests for agent-scoped credentials (disclosed and remediated during the study). Pipeline Ordering Conflict: While contradiction supersession works for admitted writes, a synchronous near-duplicate gate can prematurely reject contradictory writes before the asynchronous contradiction detector can evaluate them. Conclusion: Long-context retrieval alone is insufficient for production multi-agent memory. Governed shared memory demands explicit systems-level abstractions, and live evaluation is vital to expose enforcement and pipeline-ordering failures missed by design-only treatments.

24.
arXiv (CS.CV) 2026-06-11

Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding

In this paper, we address the problem of zero-shot understanding of accidents from surveillance videos by identifying when an impact event occurs, what type of impact it is, and where in the frame it occurs using natural language. We propose a three-stage pipeline that decomposes the accident understanding into when, what, and where. The first stage extracts a short temporal window around the impact using vision-language similarity. In the second stage, we perform metadata-driven multi-prompt reasoning with five complementary views (baseline, motion, geometry, contrast, and tiebreaker) and resolve disagreement via an entropy-gated pairwise adjudicator. Finally, we localize the impact of an open-vocabulary detector queried on the predicted accident type and scene layout, and aggregate detections across keyframes using a score-weighted centroid. Our pipeline achieves a substantial improvement in the harmonic-mean score over a centre-of-frame baseline on the zero-shot ACCIDENT @ CVPR benchmark. We show that decomposing zero-shot video understanding into temporal localization, semantic classification, and spatial grounding enable more reliable reasoning with vision-language models than direct prompting alone.

25.
arXiv (CS.AI) 2026-06-11

Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization

arXiv:2606.12251v1 Announce Type: cross Abstract: Gradient-based adversarial attacks remain a dominant threat to deep neural networks (DNNs), as they exploit gradient information to efficiently optimize adversarial perturbations. To address this, we investigate whether reinforcement learning (RL) training can disrupt the gradient structure used by attackers by training image classifiers with policy-gradient objectives and epsilon-greedy exploration. Through systematic experiments across CIFAR-10, CIFAR-100, and ImageNet-100 with multiple architectures, we find that RL-trained classifiers significantly disrupt gradient-based adversarial optimization. To explain this, we conduct a comprehensive mechanism analysis using loss landscape visualization, static and dynamic gradient indicators, and predictive entropy. Our analysis reveals that RL acts as an implicit regularizer, producing models with highly unstable gradient directions and smaller gradient magnitudes. This combination makes each PGD step both unreliable in direction and limited in magnitude, causing gradient-based attacks to fail within practical iteration budgets. We further show that combining RL with adversarial training (RL-adv) provides a dual-layer defense operating at two complementary levels: RL degrades gradient information available to attackers (gradient-level defense), while adversarial training strengthens decision boundaries (boundary-level defense). RL-adv achieves the highest robustness across all major attack types evaluated, including gradient-based (PGD, AutoAttack), transfer-based, and query-based attacks, outperforming SL-adv by a significant margin. These findings identify RL-induced gradient disruption as a complementary robustness mechanism and motivate future research on hybrid SL-RL training schedules that combine SL's efficiency with RL's gradient-regularization properties.