Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-19

FloatDoor: Platform-Triggered Backdoors in LLMs

arXiv:2606.19535v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in sensitive settings such as software engineering, where their outputs directly shape downstream artifacts. Recent work has shown that an identical model can produce measurably different outputs depending on the deployment platform, a consequence of non-associative floating-point arithmetic and divergent kernel implementations. We study the security implications of this platform-dependent variability and uncover a novel attack surface on LLM deployments. We introduce FloatDoor, the first input-independent, platform-triggered backdoor attack against generative LLMs. The compromised model exhibits adversary-chosen behavior when served on a target platform and is otherwise benign. FloatDoor is realized through two lightweight LoRA adapters, one that amplifies inter-platform numerical divergence and one that binds the resulting platform signature to a malicious downstream task, while leaving aggregate model utility largely intact. FloatDoor exploits a pronounced time-of-check, time-of-use gap between model auditing and serving. We demonstrate FloatDoor on Qwen3-4B across a broad range of deployment targets, including NVIDIA GPUs, Google TPUs, AWS Graviton, and Alibaba Yitian-710. As a final case study, we show that FloatDoor reliably induces exploitable code vulnerabilities on a chosen target platform. Our results establish a new class of attacks on LLM deployments and underscore the pressing need for trusted model supply chains in sensitive, LLM-powered applications.

02.
arXiv (CS.CV) 2026-06-25

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Worldwide image geolocalization-the task of predicting GPS coordinates from images taken anywhere on Earth-poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a two-stage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods.

03.
arXiv (CS.LG) 2026-06-25

Efficient Adaptive Data Acquisition via Pretrained Belief Representations

arXiv:2606.25197v1 Announce Type: new Abstract: Learning effective policies for adaptive data acquisition remains challenging: posterior-based methods rely on surrogate models and posterior approximations that can be misspecified or biased, while direct policy-learning methods map from historical observations and fail to exploit available model representations, making learning harder. We introduce policy learning with belief representations (POLAR), based on the insight that optimal data acquisition depends on the observation history only through a sufficient belief state. Specifically, POLAR decouples representation learning from policy learning by leveraging pretrained predictive foundation models as belief-state encoders, training a policy head on top of their representations. This yields a simple, unified amortised policy learning framework for Bayesian experimental design, Bayesian optimisation, and active learning, differing only in the task-specific utility used to train the policy. Empirically, we find that POLAR outperforms state-of-the-art amortised methods across diverse tasks while requiring far fewer training samples, demonstrating a significant step in the scalability and efficiency of amortised data acquisition.

04.
arXiv (CS.AI) 2026-06-19

Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

arXiv:2606.20177v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sensing (RS) tasks. However, their ability to comprehend negation remains underexplored, limiting deployment in real-world applications where models must explicitly identify what is false or absent, e.g., emergency responders need to locate non-flooded routes for evacuation. To comprehensively study this limitation, we introduce RS-Neg, the first benchmark to evaluate negation understanding across region-level to scene-level tasks. Specifically, we design an automated data generation pipeline for RS imagery, using LLMs to synthesize diverse negation queries, and introduce a dynamic visual focus module for verification. Our evaluation reveals that advanced RS MLLMs struggle with negation, exhibiting hallucinations and substantial performance degradation. To close this gap, we propose NeFo, a novel test-time learning method that explicitly incorporates the logical role of negation into the model optimization. Remarkably, using about 5\% unlabeled test samples, NeFo significantly improves the negation understanding of models and shows strong generalization to unseen tasks. Code and data will be released upon acceptance.

05.
arXiv (CS.AI) 2026-06-24

A Unified Framework for Runtime Verification and Model-Based Diagnosis in LOLA

arXiv:2606.23720v1 Announce Type: cross Abstract: We present an integrated framework that unifies runtime verification and model-based diagnosis within the stream specification language LOLA. By encoding system descriptions, component health states, and observations into a single stream-based formalism, the approach enables continuous, online fault localization directly alongside fault detection, without requiring separate toolchains. The framework supports both time-invariant and transient faults, and naturally accommodates nondeterministic observations.

06.
arXiv (CS.LG) 2026-06-16

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

arXiv:2602.06694v3 Announce Type: replace Abstract: Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) levels, as they either require large amounts of data and compute or incur additional storage. In this work, we propose NanoQuant, the first post-training quantization (PTQ) method to compress LLMs to both binary and sub-1-bit levels. NanoQuant formulates quantization as a low-rank binary factorization problem, and compresses full-precision weights to low-rank binary matrices and scales. Specifically, it utilizes an efficient alternating direction method of multipliers (ADMM) solver to precisely initialize latent binary matrices and scales, and then tunes the initialized parameters through a block and model reconstruction process. Consequently, NanoQuant establishes a new Pareto frontier in low-memory post-training quantization, and enables sub-1-bit compression. NanoQuant makes large-scale deployment feasible on consumer hardware. For example, it compresses Llama2-70B by 25.8$\times$ in just 13 hours on a single H100, enabling a 70B model to operate on a consumer 8 GB GPU. Code is available at https://github.com/SamsungLabs/NanoQuant.

07.
PLOS Medicine 2026-06-12

Placenta accreta spectrum in the 21st century: Challenging dogma and redefining disorder

by Eric Jauniaux, Helena C. Bartels, Yalda Afshar Placenta accreta spectrum (PAS) is a serious pregnancy complication caused by abnormal placental attachment to the uterus. In this Perspective, Eric Jauniaux and colleagues discuss emerging evidence that challenges our long-held pathophysiological understanding of PAS, and argue that a critical reassessment of definition, diagnosis, and management is overdue. In this Perspective, Jonathan Evans and colleagues discuss why restricting access to joint replacement surgery based on BMI alone is not supported by evidence, and highlight how such rest rictions risk exacerbating stigma, inequity and avoidable harm to those who would benefit from surgery.

08.
arXiv (CS.CV) 2026-06-15

Vanishing Depth: Training Generalized Depth Adapters with Sinusoidal Depth Preprocessing for Pretrained RGB Encoders

Generalized metric depth understanding is critical for precise vision-guided robotics, which current state-of-the-art (SOTA) vision-encoders do not support. To address this, we propose a self-supervised training approach that extends pretrained RGB encoders with a depth adapter to incorporate and align metric depth into a combined latent space without interfering with the pretrained RGB feature extraction. In combination with our sinusoidal depth encoding, the depth adapter enables generalized and robust depth density and distribution invariant feature extraction. Our depth adapters improve a wide set of generalized RGB baselines across a spectrum of relevant RGBD downstream tasks in segmentation, pose estimation, and depth completion – without the necessity of finetuning. Most importantly, we achieve 56.05 mIoU in the SUN-RGBD segmentation, while outperforming SOTA depth-aware and multi-modal encoders in our experiments. When no depth is present, one can activate our depth adapter with an empty map, use single pixel depth clues, or monocular depth estimation to include the depth aware feature extraction into subsequent downstream tasks.

09.
arXiv (CS.AI) 2026-06-18

IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales

arXiv:2604.03275v2 Announce Type: replace-cross Abstract: Effective adaptation and mitigation strategies for climate change require high-resolution projections to inform strategic decision-making. Conventional global climate models, which typically operate at resolutions of 150 to 200 kilometers, lack the capacity to represent essential regional processes. IPSL-AID is a global to regional downscaling tool based on a denoising diffusion probabilistic model designed to address this limitation. Trained on ERA5 reanalysis data, it generates 0.25 degree resolution fields for temperature, wind, and precipitation using coarse inputs and their spatiotemporal context. It also models probability distributions of fine-scale features to produce plausible scenarios for uncertainty quantification. The model accurately reconstructs statistical distributions, including extreme events, power spectra, and spatial structures. This work highlights the potential of generative diffusion models for efficient climate downscaling with uncertainty

10.
arXiv (quant-ph) 2026-06-25

Radial Schmidt mode detector of entangled photons

arXiv:2606.25735v1 Announce Type: new Abstract: High-dimensional spatially entangled two-photon state generated by spontaneous parametric down-conversion process (SPDC) has become a promising resource for several quantum information science applications. For harnessing high-dimensional entanglement advantages, detection capability in the Schmidt basis is a necessity. Spatial entanglement has been explored in several modal bases, such as pixel, azimuthal, and radial modes. Among them, pixel and azimuthal entanglement have been widely utilized due to efficient access to their Schmidt modes, while radial-mode entanglement remains underexploited. This is because for radial coordinates, there is neither a Schmidt-decomposed form for the SPDC photons nor is there a technique for measuring high-dimensional radial Schmidt modes, which is a major roadblock in harnessing radial mode advantages. In this work, we first theoretically show that the azimuthal averaging of SPDC two-photon state yields a radial Schmidt-decomposed form under typical experimental situations. We then demonstrate an innovative approach for extracting the radial Schmidt modes and their spectrum by characterizing the density matrix in the radial basis of one of the SPDC photons. Finally, we report the first-ever measurement of radial Schmidt spectrum of upto 50 radial Schmidt modes with about 98\% fidelity.

11.
arXiv (CS.LG) 2026-06-17

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

arXiv:2606.09770v2 Announce Type: replace-cross Abstract: Nearby neurons in cortex share similar response profiles, producing systematic spatial organization across sensory and cognitive systems. Recent topographic models reproduce aspects of this structure but remain unimodal and spatially constrain each layer separately, yielding fragmented maps that capture neither the contiguity of cortical processing streams nor their integration across modalities. We introduce Topo-Omni, a topographic multimodal model in which visual, auditory, and language/cognitive processing share a single contiguous in-silico sheet. Built by fine-tuning a pretrained foundation model with a spatial smoothness objective, this architecture develops clusters across modalities that are consistent with human neuroimaging, from sensory to cognitive systems. Driving or suppressing a cluster selectively biases or impairs perception, paralleling human intervention studies. Finally, we use our model to screen for novel clusters in-silico and discover new natural landscape and animal networks which we validate in human data. A single spatial principle thus organizes representations across modalities and processing stages, yielding testable hypotheses about cortical organization.

12.
arXiv (CS.LG) 2026-06-12

Efficient Stochastic Optimisation via Sequential Monte Carlo

arXiv:2601.22003v2 Announce Type: replace-cross Abstract: The problem of optimising functions with intractable gradients frequently arises in machine learning and statistics, ranging from maximum marginal likelihood estimation procedures to fine-tuning of generative models. Stochastic approximation methods for this class of problems typically require inner sampling loops to obtain (biased) stochastic gradient estimates, which rapidly becomes computationally expensive. In this work, we develop sequential Monte Carlo (SMC) samplers for optimisation of functions with intractable gradients. Our approach replaces expensive inner sampling methods with efficient SMC approximations, which can result in significant computational gains. We establish convergence results for the basic recursions defined by our methodology which SMC samplers approximate. We demonstrate the effectiveness of our approach on the reward-tuning of energy-based models within various settings.

13.
medRxiv (Medicine) 2026-06-11

Long-term exposure to PM2.5 components and lipid profiles in WTC Health Program general responders

Fine particulate matter (PM2.5) was found to be associated with elevated blood lipids, but fewer studies have examined the associations with specific constituents of PM2.5. We studied the associations between exposure to annual PM2.5 and its 14 constituents, and repeated blood lipid measurements among general responders enrolled in the World Trade Center Health Program between 2003 and 2019 (n = 44,876). We used generalized additive mixed effect models to investigate the single-pollutant associations with repeated measures of blood total cholesterol (TC), high and low-density lipoprotein (HDL-C and LDL-C) levels. We then used linear generalized weighted quantile sum regression with a random intercept for participant ID to account for the clustering of repeated measures and evaluate the combined associations with the component mixture. A decile increase in the mixture of 14 PM2.5 chemical components was associated with 0.375 mg/dL increase in TC levels (95% confidence Interval (CI): 0.174-0.577) and 0.302 mg/dL increase in LDL-C (95% CI: 0.063, 0.540). Lead, organic carbon, and iron were major drivers of both associations. Component-specific models also show higher TC and LDL levels associated with interquartile range increases in organic carbon (0.472, 95% CI [0.027, 0.918] and 0.648 95% CI [0.136, 1.160]) and iron exposure (1.081, 95% CI [0.630, 1.532] and 0.748, 95% CI [0.318, 1.178]). In conclusion, we found PM2.5 exposure to be associated with elevated lipid levels. The associations differed by PM2.5 composition, highlighting organic carbon, lead, and iron and major drivers. These findings are highly significant for a population exposed to extreme air pollution event and susceptible to lipid alterations that might trigger cardiovascular events.

14.
arXiv (CS.CL) 2026-06-11

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few thousand steps, leaving most of training invisible to the instrument. We introduce fragility, a complementary per-layer metric defined as the activation-noise level at which probe accuracy collapses. Fragility is sensitive to both the margin of separability and the redundancy of representation, both of which keep evolving long after accuracy plateaus. Applied to open-checkpoint language models, fragility recovers structure that accuracy alone cannot see. Moralized representations emerge along a lexical $\to$ compositional gradient: lexical moral detection first, compositional moral encoding later. Because probe accuracy on its own tracks how lexically separable a dataset is, we establish the compositional encoding directly, by showing it transfers across construction types that share no contrast tokens. A layer-depth robustness gradient develops monotonically across training while accuracy stays flat. And matched fine-tuning corpora that produce identical probing accuracy leave distinct fragility fingerprints, showing that data curation reshapes probe robustness without changing probe accuracy. In every comparison we test, where probing accuracy returns a flat answer, fragility returns a structured one.

15.
arXiv (CS.CV) 2026-06-18

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. We introduce RNG-Bench (Reconstructive Non-Markov Games), a benchmark suite designed to isolate a base model's ability to reconstruct past observations and act on them during multi-step interaction. RNG-Bench includes two complementary games: Matching Pairs, where card identities briefly revealed at specific locations must later be recalled, and 3D Maze, where egocentric views must be integrated into a spatial map. Both games are evaluated under a unified harness with three controlled difficulty axes: grid size, visual pattern, and observation modality. The benchmark further introduces a head-to-head duel protocol to control for instance-level variance and a Memory Gap metric that disentangles forgetting from poor action selection. The hardest configurations require contexts of roughly 128K tokens and 350 image inputs per episode, and remain far from saturated by frontier MLLMs. Memory Gap analysis shows that most residual errors stem from forgetting earlier observations rather than from suboptimal decision making. Finally, fine-tuning Qwen3.5-9B on optimal-policy rollouts and filtered model demonstrations improves performance on RNG-Bench and transfers to existing benchmarks without degrading general multimodal capability.

16.
arXiv (CS.LG) 2026-06-19

A deep learning framework for jointly solving transient Fokker-Planck equations with arbitrary parameters and initial distributions

arXiv:2604.06001v2 Announce Type: replace-cross Abstract: Efficiently solving the Fokker-Planck equation (FPE) is central to analyzing complex parameterized stochastic systems. However, current numerical methods lack parallel computation capabilities across varying conditions, severely limiting comprehensive parameter exploration and transient analysis. This paper introduces a deep learning-based pseudo-analytical probability solution (PAPS) that, via a single training process, simultaneously resolves transient FPE solutions for arbitrary multi-modal initial distributions, system parameters, and time points. The core idea is to unify initial, transient, and stationary distributions via Gaussian mixture distributions (GMDs) and develop a constraint-preserving autoencoder that bijectively maps constrained GMD parameters to unconstrained, low-dimensional latent representations. In this representation space, the panoramic transient dynamics across varying initial conditions and system parameters can be modeled by a single evolution network. Extensive experiments on paradigmatic systems demonstrate that the proposed PAPS maintains high accuracy while achieving inference speeds four orders of magnitude faster than GPU-accelerated Monte Carlo simulations. This efficiency leap enables previously intractable real-time parameter sweeps and systematic investigations of stochastic bifurcations. By decoupling representation learning from physics-informed transient dynamics, our work establishes a scalable paradigm for probabilistic modeling of multi-dimensional, parameterized stochastic systems.

17.
bioRxiv (Bioinfo) 2026-06-15

Biological meaning in protein embedding space is resolution-dependent

Protein language model embeddings are increasingly used to organise biological sequences, yet how biological meaning is encoded within embedding neighbourhoods remains poorly understood. Using two independent hierarchical enzyme systems, carbohydrate-active enzymes and peptidases, we investigated how biological interpretation changes across embedding organisations aligned to different levels of biological hierarchy. Different embedding organisations give rise to distinct neighbourhood semantics. When aligned to membership-boundary resolution, embeddings robustly separated artefacts and unrelated proteins from members of the target category. However, embeddings aligned to functional-grouping resolution maintained compositional neighbourhood structure for multi-domain proteins spanning more than one functional or catalytic group. Finally, embeddings aligned to local-family resolution recovered compact family-like neighbourhoods, including families withheld from training, while weakening broader membership-boundary and functional-grouping relationships. Moreover, embeddings optimised toward the same level of biological organisation retain different biological relationships depending on optimisation trajectory employed. Together, our results show that proximity in protein embedding space has no fixed biological interpretation. Instead, biological meaning emerges across embedding resolutions through selective preservation of different forms of biological organisation.

18.
arXiv (CS.CV) 2026-06-16

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

Autoregressive long video generation often adopts bounded-memory streaming for efficiency, typically combining local windows for short-term continuity with static early-frame sinks as long-range anchors. However, this fixed allocation keeps early frames cached even when the current visual state has substantially diverged from them, while discarding potentially more relevant intermediate history. As a result, the retained long-range context may become less adaptive and bias generation toward outdated cues; in severe cases, RoPE-induced phase re-alignment can homogenize inter-head attention and cause sink collapse, where content regresses toward sink frames. We propose DySink, a retrieval-based framework that maintains a compact memory bank and selects visually relevant historical frames as dynamic frame sinks. DySink couples adaptive retrieval with a sink anomaly gate, which detects excessive inter-head consensus over retrieved context and suppresses collapse-prone context. Experiments on minute-long videos show that DySink consistently improves dynamic degree over strong baselines while also achieving higher temporal quality. The code and model weights will be released at https://github.com/yebo0216best/DySink.

19.
arXiv (CS.AI) 2026-06-16

AIRMap: AI-Generated Radio Maps for Wireless Digital Twins

arXiv:2511.05522v4 Announce Type: replace-cross Abstract: Accurate, low-latency channel modeling is essential for real-time wireless network simulation and digital-twin applications. Traditional modeling methods like ray tracing are however computationally demanding and unsuited to model dynamic conditions. In this paper, we propose AIRMap, a deep-learning framework for ultra-fast radio-map estimation, along with an automated pipeline for creating the largest radio-map dataset to date. AIRMap uses a single-input U-Net autoencoder that processes only a 2D elevation map of terrain and building heights. Trained on 1.2M Boston-area samples and validated across four distinct urban and rural environments with varying terrain and building density, AIRMap predicts path gain with under 4 dB RMSE in 4 ms per inference on an NVIDIA L40S-over 100x faster than GPU-accelerated ray tracing based radio maps. A lightweight calibration using just 20% of field measurements reduces the median error to approximately 5%, significantly outperforming traditional simulators, which exceed 50% error. Integration into the Colosseum emulator and the Sionna SYS platform demonstrate near-zero error in spectral efficiency and block-error rate compared to measurement-based channels. These findings validate AIRMap's potential for scalable, accurate, and real-time radio map estimation in wireless digital twins.

20.
arXiv (math.PR) 2026-06-11

Instability of a nonlinear oscillator with small friction and small additive noise

arXiv:2606.11389v1 Announce Type: new Abstract: Let $\lambda = \lambda(\beta,\sigma,a,b)$ denote the top Lyapunov exponent for the linearization along trajectories of the noisy damped non-linear oscillator $\ddot{x}+\beta \dot{x} + ax+bx^3 = \sigma \dot{W}_t$, where $a$, $b$ and $\beta$ are all positive and $\sigma \neq 0$. In 2004 Arnold, Imkeller and Sri Namachchivaya stated without proof that $\lambda(\varepsilon^2 \beta,\varepsilon \sigma,a,b) \sim \overline{\lambda} \varepsilon^{2/3}$ as $\varepsilon \to 0$ with $\overline{\lambda} > 0$. This paper contains a proof of this assertion.

21.
arXiv (CS.AI) 2026-06-19

QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval

arXiv:2606.19733v1 Announce Type: cross Abstract: Efficiently retrieving specific 3D instances from large-scale scenes via natural language prompts remains a formidable challenge in multimedia analysis. Existing approaches predominantly follow a "scene-level embedding" paradigm, which requires distilling high-dimensional semantic features into every 3D primitive. This strategy suffers from a fundamental architectural bottleneck: memory and computational costs scale linearly with scene complexity, inevitably triggering out-of-memory (OOM) failures in city-scale environments. To address this barrier, we propose QueryGaussian, a training-free framework for expeditious and scalable open-vocabulary 3D instance retrieval. Unlike holistic semantic distillation, QueryGaussian employs an instance-level query mechanism that decouples semantic understanding from geometric representation. Specifically, we leverage pre-trained 2D vision models to interpret user prompts and lift segmentation masks into 3D via a concurrent maximum-weight association strategy, ensuring semantic-visual consistency. To mitigate projection ambiguity, we introduce a temporal fusion module with multi-stage adaptive density clustering. Experimental results demonstrate that QueryGaussian not only matches the accuracy of state-of-the-art methods but also delivers a decisive efficiency leap, reducing GPU memory usage by over 70% and accelerating inference by 180x. Crucially, QueryGaussian enables expeditious instance retrieval on city-scale scenes containing tens of millions of Gaussians using consumer-grade hardware.

22.
arXiv (CS.AI) 2026-06-17

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

arXiv:2605.12729v2 Announce Type: replace-cross Abstract: Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited self-healing. In both NetOps and AIOps, this shift is changing how tasks are managed. Agent-based operations work as workflows, from gathering evidence to taking action, following permissions, policies, and checks, and providing rollback options when necessary. This is crucial because operational decisions can have instant impacts. To make the argument concrete, we organise the relevant literature around the hierarchy of autonomy, tool scope, evidence traces, and assurance contracts. These contracts define what an agent may observe, propose, and execute. They also define the checks that must pass before any action is allowed. A consistent pattern appears across work on telemetry query recommendation, diagnosis, root-cause analysis, configuration synthesis, change planning, and limited self-healing. Operational reliability does not come chiefly from the model itself. It depends on the machinery around the model. We also argue that evaluation should go beyond static question answering. Agentic NetOps and AIOps systems require workflow-centred evaluation, including trace quality, bounded tool use, safe proposal generation, replay in sandboxed environments, and canary trials with rollback-aware scoring. Without these measures, a system may appear robust yet remain too fragile. Finally, we examine security, privacy, and governance risks that become acute when agents sit close to operational control surfaces. Taken together, the survey concludes that progress in intelligent NetOps and AIOps will depend on treating autonomy as a constrained operational control problem, whose outputs must be reliable, auditable, and securely deployable.

24.
arXiv (CS.CV) 2026-06-25

How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently understood. This gap is critical for OCR reasoning, where visual corruption can induce OCR errors and structural distortions, thereby introducing uncertainty into the reasoning task. To systematically study this problem, we introduce OCR-Robust, a benchmark designed for evaluating OCR reasoning robustness under visual perturbations. It contains 812 samples across two complementary subsets: OCR1.0, covering documents, scene text, receipts, handwriting, and mathematical content, and OCR2.0, focusing on charts, geometry diagrams, and tables. To enable efficient yet informative evaluation, we conduct a pilot study over 18 candidate perturbations and select 5 representative types at 3 severity levels each based on their impact and cross-model discriminability. We evaluate robustness using clean accuracy, Relative Corruption Retention (RCR), Worst-Case Retention (WCR), and a composite Corruption Robustness Index (CRI), and benchmark 18 models spanning proprietary systems, open-source VLMs, and OCR+LLM pipelines. Our results show that higher clean accuracy does not necessarily imply stronger robustness, and that models can suffer pronounced degradation in the worst case on OCR tasks that are sensitive to structure, and charts and tables are substantially more fragile than document-like inputs under perturbation.

25.
arXiv (CS.CV) 2026-06-17

Similarity-based representation factorization for revealing interpretable dimensions in representational data

The study of representations is widespread across fields, including neuroscience, psychology, and artificial intelligence. While representations are often studied and compared through similarities between stimuli, current methods provide only limited access to the dimensions that shape these representations and are often limited in interpretability. To overcome these challenges, here we introduce Similarity-Based Representation Factorization (SRF), a general computational method for recovering low-dimensional, non-negative, interpretable embeddings from similarity matrices derived from measured data. Across simulations and many neural, behavioral, and computational datasets, SRF recovers interpretable dimensions from diverse forms of representational data, even for very sparsely sampled, incomplete data. The dimensions derived from these datasets match those obtained by task-specific models, predict independent behavioral properties, improve exploratory analysis, and offer higher power for confirmatory hypothesis testing than comparing similarity matrices. Together, these results establish SRF as a general-purpose method with broad applications for uncovering, understanding, and using the dimensions underlying representations.