Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-18

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

arXiv:2511.05221v3 Announce Type: replace Abstract: Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $\alpha$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

02.
arXiv (CS.CV) 2026-06-16

Near–Real-Time Conflict-Related Fire Detection in Sudan Using Unsupervised Deep Learning

Ongoing armed conflict in Sudan highlights the need for rapid monitoring of conflict-related fire-affected areas. Recent advances in deep learning and high-frequency satellite imagery enable near–real-time assessment of active fires and burn scars in war zones. This study presents a near–real-time monitoring approach using a lightweight Variational Auto-Encoder (VAE)–based model integrated with 4-band Planet Labs imagery at 3 m spatial resolution. We demonstrate that these impacted regions can be detected within approximately 24 to 30 hours under favorable observational conditions using accessible, commercially available satellite data. To achieve this, we adapt a VAE–based model, originally designed for 10-band imagery, to operate effectively on high-resolution 4-band inputs. The model is trained in an unsupervised manner to learn compact latent representations of nominal land-surface conditions and identify burn signatures by quantifying changes between temporally paired latent embeddings. Performance is evaluated across five case studies in Sudan and compared against cosine distance, CVA, and IR-MAD using precision, recall, F1-score, and the area under the precision-recall curve (AUPRC) computed between temporally paired image tiles. Results show that the proposed approach consistently outperforms the other methods, achieving higher recall and F1-scores while maintaining viable precision in highly imbalanced fire-detection scenarios. Experiments with 8-band imagery and temporal image sequences yield only marginal performance gains over single 4-band inputs, underscoring the effectiveness of the proposed lightweight approach for scalable, near–real-time conflict monitoring.

03.
Nature (Science) 2026-06-09

Daily briefing: Trial to ‘de-age’ cells treats first person

作者:

The gene-therapy trial aims to treat glaucoma by rejuvenating cells in the optic nerve. Plus, the mystery of how things freeze and encouragement to go out into the sunlight. The gene-therapy trial aims to treat glaucoma by rejuvenating cells in the optic nerve. Plus, the mystery of how things freeze and encouragement to go out into the sunlight.

04.
arXiv (CS.CL) 2026-06-18

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider boundary. This coupling makes grounding hard to inspect, tune, reuse, or port, and can trigger Search-Induced Verbosity that breaks strict output contracts. We present Decoupled Search Grounding (DSG), a vendor-agnostic boundary that moves grounding outside the reasoning model through an MCP-compatible gateway, exposing provider routing, source-aware context rendering, configured fallback, retrieval-depth control, and exact plus semantic caching as first-class controls. Across five frontier models on SimpleQA, FreshQA, and HotpotQA, native search leads on recency-sensitive FreshQA, but DSG exposes a stronger frontier when control matters: on SimpleQA it nearly matches native accuracy (86.1% vs. 87.7%) at 91% lower search cost, preserves concise answer contracts, and reaches a 99.4% warm-cache hit rate with 68% lower latency. Deployed as a shared production grounding layer for large-scale agentic workloads with interchangeable models, DSG matches or slightly exceeds native-search accuracy on an e-commerce query-understanding (QIU) workload while cutting search cost by over 98%. Real-time grounding is best treated as an optimizable interface boundary, not a fixed model feature.

05.
arXiv (CS.LG) 2026-06-16

A Compositional Framework for Open-ended Intelligence

arXiv:2606.15386v1 Announce Type: new Abstract: Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. We formalize open-ended intelligence as the closure induced by a finite primitive set \(P\) and a set of composition operators \(C\). We characterize properties of the induced closure \(\mathcal{L}(P,C)\) that support unbounded compositional generation across families of tasks and worlds. A mathematics of open-ended intelligence requires two pillars: a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., nearest neighbor), together with composition motifs (e.g., recursion, sequencing) that reflect an acquired compositional grammar. The closure of these two pillars enables the generation of infinite adaptive responses across a wide range of settings. The mathematics supports complementary research agendas, including evaluation metrics for explanation and interpretability, as well as building architectures where compositional generalization is native. We propose next primitive prediction as a novel architectural objective, where the training objective encourages the acquisition of reusable algorithmic primitives and their compositional grammar, such that new solutions are generated through recombination. Curriculum learning and self-play enable lifelong learning and expansion of the closure by discovering reusable primitives and transition motifs across families of tasks and worlds. We ground the framework through case studies in physics, evolution, and neuroscience.

06.
Nature (Science) 2026-06-12

Daily briefing: How Venus flytraps snap shut

作者:

Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message. Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message.

07.
arXiv (CS.CL) 2026-06-15

An Empirical Study of Automating Agent Evaluation

Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate this evaluation process? Our study shows that simply prompting coding assistants is insufficient for this task. Without domain-specific evaluation knowledge, frontier coding assistants achieve only a 30% execution success rate and produce over-engineered evaluations averaging 12+ metrics per agent, indicating that strong coding ability does not automatically translate to reliable agent evaluation. We introduce EvalAgent, an AI assistant that automates the end-to-end agent evaluation pipeline. EvalAgent encodes evaluation domain expertise as evaluation skills (procedural instructions, reusable code and templates, and dynamically retrieved API documentation) that compose into a trace-based pipeline producing complete evaluation artifacts including metrics, executable code, and reports. To systematically assess generated evaluations, we introduce a meta-evaluation framework alongside AgentEvalBench, a benchmark comprising 20 agents, each paired with evaluation requirements and test scenarios. We further propose the Eval@1 metric to measure whether generated evaluation code both executes and yields meaningful results on the first run. Our experiments show that EvalAgent produces focused evaluations, improving Eval@1 from 17.5% to 65%, and achieving 79.5% human expert preference over baseline approaches. Further ablation studies show that evaluation skills are critical for handling complex evaluation: removing them causes Eval@1 to drop significantly from 65% to 30%.

08.
medRxiv (Medicine) 2026-06-11

Malaria Risk among Internally Mobile Individuals and Heterogeneous Mobility Patterns in Two Hypoendemic Communities: Implications for Malaria Elimination in the Peruvian Amazon.

Background: Human mobility is increasingly recognized as a key factor influencing malaria transmission dynamics, particularly in low-transmission settings approaching elimination. This study aimed to assess mobility patterns and their association with malaria risk in two hypoendemic communities in the Peruvian Amazon. Method: A longitudinal study was conducted in the communities of Libertad and Urcomirano (Mazan River basin). Monthly population screenings were combined with weekly active and passive case detection. A total of 678 individuals were enrolled. Mobility patterns were assessed through structured questionnaires, and social network analysis was used to characterize travel connections. Log-binomial regression analysis was applied to identify risk factors associated with malaria infection. Result: Internally, mobile individuals in Libertad showed a higher malaria incidence (>32.47 cases per 1,000 person-months) than those in Urcomirano (

09.
arXiv (CS.LG) 2026-06-11

MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications

arXiv:2512.08211v2 Announce Type: replace Abstract: Large language models (LLMs) are moving from cloud-centric services toward on-device embedded AI, where models interact with private, longitudinal signals sensed from users and their physical environments. Mobile phones are a natural platform for such applications because they are continuously carried by users, connected to wearable sensors, and deeply integrated with daily mobile applications. However, practical LLM fine-tuning on commodity phones remains difficult. Existing fine-tuning frameworks are largely Python-based and server-oriented, making them hard to deploy inside mobile applications. We present MobileFineTuner, a mobile-native open-source framework for end-to-end LLM fine-tuning on commodity mobile phones. MobileFineTuner is implemented in C++ and provides a reusable training stack. To make fine-tuning feasible under mobile resource constraints, MobileFineTuner integrates a resource-aware training runtime with memory-efficient attention, activation checkpointing, gradient accumulation, parameter sharding, and energy-aware scheduling. We evaluate MobileFineTuner on real mobile phones using GPT-2, Gemma 3, and Qwen2.5 models across multiple fine-tuning tasks. The results show that MobileFineTuner reproduces standard Full-FT and LoRA fine-tuning behavior, substantially reduces memory pressure and improves executability on memory-constrained phones. We further demonstrate MobileFineTuner through a private campus health-agent application, where a local LLM is fine-tuned on user-specific wearable-sensing records to provide more personalized responses while keeping raw records on the phone. These results establish MobileFineTuner as a practical toolkit for studying and building on-device LLM fine-tuning applications in embedded AI and sensing systems.

10.
arXiv (CS.LG) 2026-06-19

Minimal Filling Architectures of Polynomial Neural Networks: Counterexamples, Frontier Search, and Defects

arXiv:2605.09609v2 Announce Type: replace Abstract: We provide counterexamples to the unimodal minimal filling architecture conjecture for polynomial neural networks (PNNs) with power activation functions. Fixing the input and output widths, the conjecture states that any minimal filling architecture has unimodal widths for the hidden layers. We found counterexamples via a frontier search, recursive dimension bounds on neurovarieties, and symbolic computation. Notably, several subarchitectures of our main example exhibit large defect, in contrast with the predominantly small-defect behavior observed in prior literature.

11.
arXiv (math.PR) 2026-06-16

A Concavity Theorem for the Parisi PDE

作者:

arXiv:2606.15432v1 Announce Type: new Abstract: We prove that the map sending the diffusion profile to the solution of a time-changed Parisi PDE evaluated at time-space $(0,0)$ is concave. This result strengthens the raywise concavity result proven by Auffinger and Chen (2016). As an application, for the balanced multispecies Ising spin glasses, the lower bound of Bates and Sohn (2025) matches the Hopf-type upper bound given by the Hamilton–Jacobi framework developed by Mourrat, Chen and Xia.

12.
arXiv (CS.CV) 2026-06-19

VFACamou: View-Fused Adversarial Camouflage for Environment-Adaptive Physical Evasion

Adversarial camouflage in the physical world remains highly challenging, particularly under UAV reconnaissance where targets undergo continuous geometric changes and extreme illumination variations. Existing methods either optimize 2D digital perturbations that fail to generalize to dynamic viewpoints or produce visually unnatural textures that cannot be deployed in real scenarios. Therefore, we propose an end-to-end framework for adversarial camouflage generation that automatically produces wearable adversarial patterns and maintains stable attack performance in real physical environments with changing viewpoints, poses, and lighting conditions. Our method integrates UV-volume rendering with a diffusion-based texture generator, enabling consistent appearance under varying scales, poses, and lighting conditions. To ensure environmental realism, we propose an illumination color consistency estimator that extracts dominant background attributes and guides a natural texture loss to align the generated UV texture with the surrounding environment. A multi-scale dynamic training strategy further enhances robustness against viewpoint shifts and body deformation. Extensive experiments across multiple mainstream detectors demonstrate that our method achieves strong and stable physical attack performance while maintaining high perceptual naturalness, reducing human detection rates without introducing unnatural artifacts.

13.
arXiv (CS.LG) 2026-06-15

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

作者:

arXiv:2405.03063v3 Announce Type: replace-cross Abstract: We propose a generalized debiased Lasso estimator based on a stability principle. When a single column of the design matrix is perturbed, the estimator admits a simple update formula that can be computed from the original solution. Under sub-Gaussian designs with well-conditioned covariance, this approximation is asymptotically accurate for all but a vanishing fraction of coordinates in the proportional growth regime. The proof relies on concentration and anti-concentration arguments to control error terms and sign changes. In contrast, establishing comparable distributional limits (e.g., Gaussianity) under similar assumptions remains open. As an application, we show that the approximation significantly reduces the computational cost of resampling-based variable selection procedures, including the conditional randomization test and a local knockoff filter.

14.
arXiv (CS.CV) 2026-06-18

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising; the phase degrades significantly (dropping by $\approx 18\%$ from step 2 to step 50), whereas the magnitude remains relatively stable. Building on this insight, we propose PhaseLock, a training-free framework that preserves the valid motion priors from few-step inference throughout the denoising trajectory. Rather than relying on full-step inference for physical consistency, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Our approach effectively mitigates phase degradation, improving physical consistency by an average of 6.2 points across diverse models while largely maintaining visual fidelity, with negligible overhead ($1.06\times$ time, $1.02\times$ memory) and reduced reliance on expensive external guidance methods ($\sim5\times$ time). Project Page: https://dnwjddl.github.io/phaselock

15.
Nature (Science) 2026-06-17

Lethal plague outbreaks in Lake Baikal hunter-gatherers 5,500 years ago

Plague is among the most devastating diseases in human history1. However, early strains of the plague-causing bacterium Yersinia pestis lacked virulence factors that are required for the bubonic form until around 3,800 years ago2,3. Consequently, the morbidity and mortality of early plague strains remain unclear. Here we describe early plague strains that are associated with two phases of outbreaks among mid-Holocene hunter-gatherers near Lake Baikal in southeast Siberia, beginning from about 5,500 years ago. These outbreaks occur across four hunter-gatherer cemeteries, with a 39% detection rate for plague infection. By reconstructing kinship pedigrees, we show that small familial groups were affected, consistent with human-to-human spread of disease, and that the first outbreak occurred within a single generation. The infections appear to have resulted in acute mortality, especially among children (aged 8 to 11 years). We further note functional differences, including in the ypm superantigen locus, which is also present in present day Yersinia pseudotuberculosis. The new strains diverge ancestrally to known Y. pestis and constrain the timing of its emergence, indicating that this happened before approximately 5,700 years ago. These findings show that plague outbreaks happened earlier than previously thought and were indeed lethal. We contend that the occurrence of outbreaks among mid-Holocene hunter-gatherer communities well outside the sphere of Late Neolithic Europe challenges the notion that higher population densities and lifestyle changes during the Neolithic agricultural transition were prerequisites for plague epidemics. Analyses of ancient DNA from hunter-gatherers near Lake Baikal in southeast Siberia around 5,500 years ago indicate that highly virulent Yersinia pestis emerged earlier than previously estimated, far from the next known cases of infection in Late Neolithic Europe.

16.
arXiv (CS.CV) 2026-06-24

3DCarGen: Scalable 3D Car Generation via 3D-consistent Multi-view Synthesis

High-quality 3D vehicle assets are essential for autonomous driving simulation. Although multi-view diffusion-based paradigms enable controllable single-image reconstruction, they typically produce limited viewpoints and exhibit cross-view geometric inconsistencies, thereby reducing reconstruction fidelity in real-world scenarios. In this work, we introduce 3DCarGen, a scalable single-view 3D car generation framework designed for real-world images by synthesizing an arbitrary number of 3D-consistent multi-view images. Specifically, given a single image as input, we first synthesize a set of images from fixed viewpoints. These images are then fed into a feed-forward reconstruction model, resulting in a coarse 3D representation based on 3D Gaussian Splatting. Conditioned on this explicit 3D prior, our multi-view diffusion model generates 3D-consistent images from arbitrary camera viewpoints. We further extend a fast mesh reconstruction algorithm by incorporating color-normal joint optimization to recover detailed and coherent 3D vehicle models from the synthesized dense views. Extensive experiments on synthetic and real-world datasets demonstrate that our approach achieves robust geometric consistency and reconstruction fidelity compared to existing methods. Code and models will be released.

17.
arXiv (CS.AI) 2026-06-18

Hardware- and Vision-in-the-Loop Validation of Deep Monocular Pose Estimation for Autonomous Maritime UAV Flight

arXiv:2606.19176v1 Announce Type: cross Abstract: Autonomous UAV operations on ships require reliable vision-based relative pose estimation, yet at-sea validation is costly, weather-dependent, and risky. This paper presents a hardware-validated vision-in-the-loop framework that enables fully autonomous indoor flight while emulating photorealistic maritime environments. Rendered maritime views are processed onboard by a deep transformer-based monocular pose estimator. Delayed vision measurements are fused with high-rate IMU data using a delayed Kalman filter to provide consistent state estimates for geometric control. The system captures critical embedded effects, including perception latency, asynchronous updates, and computational constraints, that are absent in pure simulation. Autonomous takeoff, trajectory tracking, and landing experiments demonstrate stable closed-loop flight. The results establish a safe and hardware-realistic intermediate stage for developing maritime UAV autonomy prior to shipboard deployment.

18.
arXiv (CS.AI) 2026-06-16

Canonical Variates in Wasserstein Metric Space

arXiv:2405.15768v2 Announce Type: replace-cross Abstract: In this paper, we address the classification of instances represented by distributions on a vector space rather than single points. We consider classification algorithms based on pairwise distances, specifically, the Wasserstein metric between distributions. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, both between-class and within-class variations are defined as the average squared Wasserstein distances between pairs of distributions, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. Empirical studies are conducted to assess the algorithm's convergence; and experimental results demonstrate that the dimension reduction technique substantially enhances classification performance. Moreover, the new method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness to variations in how instances are summarized by distributions, such as the number of components in a Gaussian mixture model (GMM) representation.

19.
arXiv (CS.CL) 2026-06-11

Agent Skill Evaluation and Evolution: Frameworks and Benchmarks

The growth of agent skills has transformed how agentic systems are built, evaluated, and deployed. As skill libraries continue to scale, rigorous evaluation becomes critical to ensuring their utility, quality, and safety in real-world applications. Consequently, the field is undergoing an emerging paradigm shift from isolated skill creation to automated, evaluation-driven skill evolution. In this survey, we systematically examine the landscape of skill evolution and evaluation beyond foundational skill creation. We categorize evolution into four distinct paradigms, spanning execution feedback, trajectory distillation, compression, and reinforcement learning, showing how each element contributes to improving skill utility and reliability. We also provide an analysis of six skill-centric benchmark categories, identifying structural gaps in benchmark coverage, trade-offs, and metric richness to advance skill research. Finally, we identify open directions for building skill ecosystems that are generalizable, efficient, and verifiably safe. The project URL is https://github.com/Cassie07/AgentSkill_Survey

20.
arXiv (CS.CL) 2026-06-19

Generative Engine Optimization at Scale: Measuring Brand Visibility Across AI Search Engines

People increasingly get answers straight from AI search engines like ChatGPT, Claude, Perplexity, and Gemini rather than scrolling search results. Brands that once focused on search engine optimization (SEO) must now optimize for how these engines represent, cite, and recommend them – a shift variously called Generative Engine Optimization (GEO), Answer Engine Optimization (AEO), and AI Search Visibility. We treat AEO and AI Visibility as part of GEO, and study how to measure brand visibility across AI engines: what they value when they cite a brand, which sources they rely on, and what content large language models surface. The hard case is everyone outside the already-authoritative top brands – SMEs, D2C brands, creators, and early-stage startups. We analyze 100K+ prompt responses across 100+ brands tracked on Ranqo between March and May 2026. First visibility runs form a clear three-tier brand-stature ladder: global household names (e.g., Stripe, Nike) appear in 73% of relevant AI answers on their first run; established mid-market and regional brands (e.g., Olipop, Klaviyo) in 44%; niche and small brands in just 11% – about 30 percentage points per step. When engines cite sources, about 78% go to corporate websites; among non-corporate sources YouTube leads, ahead of Reddit, editorial media, and Wikipedia. The highest-leverage page is the ranked "best-of" listicle, the most-cited content format at about 21% of all citations. Sentiment is the unstable signal: whether a brand is framed positively or negatively flips about 6.7 times more often than whether it is mentioned at all. These findings provide a first large-scale baseline for measuring GEO: AI brand visibility can be measured, differs by platform, and varies strongly by brand maturity. We close by proposing seven v1.1 protocols to test whether specific recommendations can causally improve AI visibility.

21.
arXiv (quant-ph) 2026-06-19

Topological Quantum Interferometry

arXiv:2606.19730v1 Announce Type: new Abstract: Structured light provides high-dimensional Hilbert spaces holding tremendous potential for fundamental quantum optics and quantum technologies. However, existing characterization methods, like Hong-Ou-Mandel (HOM) interference, typically assume perfectly tuned conditions, overlooking the geometric physics governing spatial mode evolution. Here, we establish topological quantum interferometry driven by an interaction-based geometric phase, the exchange Berry phase (BPX). Our formalism generalizes $q$-plate state generation and characterization to arbitrary topological charges and (de)tuning conditions, demonstrating that BPX acts as a geometric marker governing spatial interference. We show BPX serves as a deterministic control parameter, decomposing two-photon spatial patterns into geometry-dictated fundamental modes. This mapping reveals topological invariants and phase singularities that function as a non-tomographic witness for state dimensionality estimation, circumventing full-state reconstruction. Being device-independent and highly scalable, this approach enables scalable high-dimensional characterization and topologically protected state selection, with direct applicability to quantum metrology and high-capacity quantum networks.

22.
arXiv (CS.LG) 2026-06-16

A nonparametric two-sample test using a parametric integral probability metric

arXiv:2606.16941v1 Announce Type: cross Abstract: Detecting distributional differences between two independent samples is a fundamental problem in statistics and machine learning. Nonparametric two-sample testing provides a principled framework for determining whether two samples are drawn from the same underlying distribution, without assuming any specific parametric form for the distribution. In this study, we propose a new two-sample test statistic based on a newly introduced integral probability metric (IPM), using a specially designed parametric discriminator class with a single node of a neural network. We show that the resulting test statistic, called PReLU-IPM, is nonparametric and establish theoretical guarantees for the associated two-sample testing procedure, PReLU-TST, including its consistency and asymptotical equivalence to nonparametric IPM-based tests under regularity conditions. By analyzing multiple simulated and real benchmark datasets, we demonstrate that PReLU-TST achieves higher power across a range of alternatives or performs comparably to its competitors, for finite samples.

23.
arXiv (CS.AI) 2026-06-12

SAIGuard: Communication-State Simulation for Proactive Defense of LLM Multi-Agent Systems

arXiv:2606.12474v1 Announce Type: cross Abstract: LLM-based multi-agent systems (MAS) solve complex tasks through inter-agent collaboration, but their communication-driven nature also allows security risks to spread across agents and trigger system-wide failures. Existing MAS defenses mainly follow a reactive paradigm after execution by detecting and isolating harmful agents, which may cause irreversible damage and degrade collaborative utility. To address this, we propose a proactive defense framework for MAS security, namely a Simulation-aware Interception Guard (SAIGuard). SAIGuard performs communication-state simulation over the MAS interaction graph, estimates the impact of incoming messages on local agent states and the global MAS state, and detects risky messages via reconstruction deviations from benign communication patterns. Instead of isolating agents, SAIGuard sanitizes or regenerates suspicious messages before it propagation into system. Experiments across diverse topologies and attack scenarios show that SAIGuard reduces attack success rates while maintaining MAS utility, outperforming reactive defenses.

24.
arXiv (CS.LG) 2026-06-12

Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks

arXiv:2505.11846v3 Announce Type: replace Abstract: We study function spaces parametrized by neural networks, referred to as neuromanifolds. Specifically, we focus on deep Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) with an activation function that is a sufficiently generic polynomial. First, we address the identifiability problem, showing that, for almost all functions in the neuromanifold of an MLP, there exist only finitely many parameter choices yielding that function. For CNNs, the parametrization is generically one-to-one. As a consequence, we compute the dimension of the neuromanifold. Second, we describe singular points of neuromanifolds. We characterize singularities completely for CNNs, and partially for MLPs. In both cases, they arise from sparse subnetworks. For MLPs, we prove that these singularities often correspond to critical points of the mean-squared error loss, which does not hold for CNNs. This provides a geometric explanation of the sparsity bias of MLPs. All of our results leverage tools from algebraic geometry.

25.
arXiv (CS.CV) 2026-06-24

Automated Residual Plot Assessment With the R Package autovi and the Shiny Application autovi.web

Visual assessment of residual plots is a common approach for diagnosing linear models, but it relies on manual evaluation, which does not scale well and can lead to inconsistent decisions across analysts. The lineup protocol, which embeds the observed plot among null plots, can reduce subjectivity but requires even more human effort. In today's data-driven world, such tasks are well suited for automation. We present a new R package that uses a computer vision model to automate the evaluation of residual plots. An accompanying Shiny application is provided for ease of use. Given a sample of residuals, the model predicts a visual signal strength (VSS) and offers supporting information to help analysts assess model fit.