Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-18

Multi-Class Brain Tumor Classification Using Advanced Deep Learning Models: A Comparative Study

Despite recent advancements in deep learning, accurately classifying brain tumors from MRI images continues to pose challenges. In this research, we present a comprehensive evaluation of five different convolutional neural networks (CNN) architectures, including a customized baseline model and four pre-trained models - for use in classifying multi-class brain tumors using a clinically-sourced dataset of approximately 10,000 MRI images. We have utilized five different architectures; VGG16, VGG19, DenseNet121, and EfficientNetB0, which were all tested and trained within an identical experimental framework. Performance was measured by both overall accuracy and tumor-wise recall as a means to measure the clinically-relevant performance of each architecture. We found that EfficientNetB0 had the best overall classification accuracy at 95%, when compared to the other architectures tested; specifically VGG16 (94.37%), VGG19 (92.29%), DenseNet121 (90.91%) and the customized CNN (78.00%). An especially important finding of our research was the considerable improvement in detecting meningiomas; specifically, while simple CNNs could detect meningiomas with a recall rate of approximately 20%, EfficientNetB0 was able to detect meningiomas with a recall rate of 89%. Meningiomas are often difficult to detect because they can appear very subtly on MRI images. Additionally, an interesting finding was that the deeper VGG19 performed worse than the shallower VGG16. This indicates that in many cases the architectural efficiency of a CNN model may be more important than its depth when working with medical images. Overall, EfficientNetB0 appears to provide the optimal trade-off between classification accuracy, number of parameters used in the model and clinically meaningful performance.

02.
arXiv (CS.AI) 2026-06-19

StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation

arXiv:2606.20005v1 Announce Type: cross Abstract: Attention distillation, which trains one attention distribution to match another by minimizing their Kullback-Leibler (KL) divergence, is widely used in knowledge distillation, model compression, continual learning, and sparse-attention LLM training. However, existing approaches materialize both attention distributions before computing the KL reduction, incurring $O(N_QN_K)$ memory and IO costs that become prohibitive at long context lengths. We present StreamKL, the first fused GPU primitive for attention KL divergence that eliminates this quadratic materialization. StreamKL derives a novel online formulation for the coupled two-distribution KL reduction, enabling a single one-pass forward kernel that streams query-key tiles through on-chip SRAM. For the backward pass, StreamKL recomputes attention probabilities tile-by-tile, avoiding storage of quadratic intermediates. We further design and implement efficient GPU kernels with dedicated optimizations. Experiments show StreamKL delivers up to $43\times$ and $14\times$ speedups over baseline methods in the forward and backward passes, respectively. Most importantly, StreamKL reduces the extra HBM footprint of attention distillation from $O(N_QN_K)$ to $O(1)$, enabling long-context distillation on a single GPU.

03.
arXiv (math.PR) 2026-06-19

A Cycle Walk for Sampling Measures on Spanning Forests for Redistricting

arXiv:2509.08629v2 Announce Type: replace-cross Abstract: We introduce the Cycle Walk, a new Markov chain Monte Carlo method for sampling distributions on balanced graph partitions, motivated by applications in political redistricting. The method operates on spanning forests and combines two types of updates: local "cycle" moves within districts and global moves that exchange population between adjacent districts while preserving balance constraints. This construction enables efficient Metropolis–Hastings correction while allowing proposals at multiple spatial scales. We show that the Cycle Walk naturally interpolates between existing approaches based on local updates and a class of global update methods derived from recombination (RECOM). Through a range of numerical experiments on synthetic graphs and real-world precinct data, we demonstrate that the Cycle Walk exhibits improved empirical convergence diagnostics for distributions that place weaker weight on spanning-tree counts, a regime that is challenging for existing methods. In particular, the algorithm remains effective when incorporating alternative compactness measures that more closely reflect policy-relevant criteria. These results suggest that the Cycle Walk provides a flexible and computationally efficient framework for sampling from a broader class of redistricting distributions than previously accessible with MCMC techniques.

04.
Nature Biotechnology 2026-06-05

Structural motif search across the protein universe with Folddisco

作者:

Detecting similar protein structural motifs in large structure collections is computationally expensive. We developed Folddisco, a fast structural motif search tool that uses an index of position-independent geometric features, including side-chain orientation, combined with a rarity-based scoring system. Folddisco is 20-fold faster in querying and fourfold more storage-efficient than existing methods while improving accuracy. Folddisco is freely available online ( https://folddisco.foldseek.com ), along with a webserver ( https://search.foldseek.com/folddisco ). Folddisco enables protein structural motif search in million scale databases.

05.
arXiv (CS.CV) 2026-06-12

CACR:Reinforcing Temporal Answer Grounding in Instructional Video via Candidate-Aware Causal Reasoning

The task of temporal answer grounding in instructional video (TAGV), which aims to locate precise video segments that respond to natural language queries, is increasingly important for direct video answer retrieval. This task remains challenging due to the need to comprehend semantically complex questions and to address the significant length mismatch between untrimmed videos and short target moments. Existing methods often suffer from sensitivity to irrelevant content or insufficient visual reasoning capabilities. To tackle these limitations, we propose a Candidate-Aware Causal Reasoning (CACR) framework. Our approach first employs a Visual-Language Pre-training based Candidate Selection (VBCS) algorithm to efficiently generate K candidate segments, then applies a temporal logic reasoning module enhanced by a rejection reward mechanism and optimized via Group Relative Policy Optimization (GRPO) for robust inference. Extensive experiments on six benchmarks demonstrate that our method achieves state-of-the-art performance in terms of mean Intersection-over-Union (mIoU), providing a new perspective for reasoning-based retrieval in long videos.

06.
arXiv (CS.CV) 2026-06-11

From Nominal Intensity to Equivalent Rainfall: A Path-Based Credibility Evaluation Framework for Simulated Rainfall in Autonomous-Driving Perception Tests

Credible simulated-rainfall conditions are essential for identifying perception-system boundaries and supporting SOTIF-oriented risk assessment in automated driving. However, closed-field tests are often described only by nominal rainfall intensity or single-point measurements, making it difficult to align simulated rain fields with real rainfall and map test results to real-world scenarios. This paper proposes a path-based credibility evaluation method for simulated rainfall in autonomous-driving perception tests. Using the drop size and velocity joint distribution of real rainfall as the reference, each candidate path is represented by path-equivalent rainfall intensity, an uncertainty band, and a path-averaged Realism of Raindrop Distribution (RRD) score. Lidar target point-cloud count and mean reflectivity are further used for perception-consistency correction, quantifying the proxy capability of each simulated-rainfall path for real-rainfall perception effects. Experiments are conducted using about 10,000 real-rainfall raindrop-spectrum samples, 728 RainSense perception samples, and 45 spatial sampling points in a 2.4 m x 7.2 m simulated-rainfall area. Results show that spatial non-uniformity remains under the same nominal condition, confirming the need for path-based evaluation. The method identifies Path IV and Path VI as preferable candidates, with results of 11.54 +/- 0.31 mm/h, RRD = 0.43, and 8.28 +/- 0.34 mm/h, RRD = 0.46, respectively. These paths show more balanced performance in rainfall-intensity stability, raindrop-spectrum realism, and perception consistency. The proposed method supports path selection, condition description, and credible interpretation of autonomous-driving perception tests under rainfall.

07.
arXiv (CS.CL) 2026-06-18

As Easy as Rocket Science: Assessing the Ability of Large Language Models to Interpret Negation in Figurative Language

Figurative language and negation are two areas that challenge current language models, however, both are widely used throughout written and spoken language. Large language models (LLMs) are also widely used in everyday contexts where they cannot necessarily be tuned for a specific dataset. It is therefore essential to understand the ability of LLMs to correctly interpret text that includes both negation and figurative language. To investigate this, we develop a set of new annotations to an existing dataset of figurative language, and test a range of language models on the dataset. We find that the combination of negation and figurativeness can present a particular challenge, and that performance overall and across different negation types is particularly dependent on the prompt style used.

08.
medRxiv (Medicine) 2026-06-17

Long-term mortality and cause-specific death after non-cardiac chest pain: a multicentre cohort study of 160,245 patients in China

Abstract Background Non-cardiac chest pain (NCCP) is commonly regarded as a low-risk condition. However, long-term mortality, cause-specific death, and high-risk subgroup characteristics remain poorly defined. Methods In this multicentre registry-linked cohort study, we linked the Chest Pain Center Registry from 101 hospitals in Hunan, China, with the Mortality and Cause of Death Registry. Adults diagnosed with NCCP from Jan 1, 2017, to Dec 31, 2021, were included. We assessed 3-year all-cause, cardiovascular, and non-cardiovascular mortality using Cox, restricted cubic spline, and Fine-Gray models. Findings Among 160,245 patients, 4674 deaths occurred within 3 years (2.9%). Mortality increased sharply after 60.5 years. Age [≥] 60.5 years (adjusted hazard ratio [aHR] 7.49 [95% CI 6.89-8.14]), rural residence (time-varying aHR 1.46 [1.35-1.57] in year 1 and 1.66 [1.46-1.89] in years 1-3), and male sex (aHR 1.47 [1.38-1.57]) independently predicted death. Three-year mortality ranged from 0.3% in younger urban women to 8.4% in older rural men. Cardiovascular diseases accounted for 56.4% of deaths among older patients, whereas other non-cardiovascular causes (22.8%) and malignancy (20.8%) were the largest categories among younger decedents. Interpretation NCCP is not uniformly benign. Age, rural residence, and sex identify patients who could benefit from risk-stratified follow-up, with cardiovascular prevention prioritised for older rural men and broader non-cardiovascular assessment considered for younger patients.

09.
arXiv (CS.AI) 2026-06-19

AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models

arXiv:2509.25148v2 Announce Type: replace Abstract: Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose AAPA (Adversarially Anchored Preference Alignment), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.

10.
arXiv (CS.LG) 2026-06-16

Incentives and Evidence in Learned Service Orchestration

arXiv:2606.16555v1 Announce Type: cross Abstract: Reinforcement learning for service orchestration has been the subject of sustained research for over a decade, yet it is not used in production at scale. The usual explanation is that learned controllers degrade under delayed and noisy telemetry, workload shifts, and uncontrolled tenants. We test whether existing evidence supports that explanation. We evaluate three highly influential RL-based orchestration systems spanning resource allocation, DAG scheduling, and autoscaling, using pre-registered predictions about comparative degradation under production-relevant perturbations and paired inference with family-wise error correction. Across the tests, most predicted performance reversals do not occur. Diagnostic analyses show that these outcomes often reflect comparator collapse, artefact limitations, or evaluation choices rather than evidence that learned controllers tolerate the perturbations. One apparent advantage under observation lag is roughly fortyfold compared to a Kubernetes HPA-equivalent controller. Another widely cited result cannot be reconstructed from its released artefact, and the strongest reproducible margin is far smaller than the published results. Conclusions also reverse under changes in perturbation magnitude and evaluation mode. Based on these results and broader patterns in the literature, we identify an institutional problem. Publication and review incentives favour benchmark gains against convenient comparators, even when those gains provide little evidence of deployment performance. We argue that the problem is not solely technical. Rather, it is institutional, so learned orchestration needs production-grade comparators, registered perturbation models, separate operational metrics, and publication criteria that reward reproducible operational evidence. Without these changes, the literature can grow without establishing whether learning improves orchestration.

11.
arXiv (quant-ph) 2026-06-12

Entropic order parameters and topological holography

arXiv:2512.24225v2 Announce Type: replace-cross Abstract: We show that the symmetry topological field theory (SymTFT) construction, also known as the topological holography, provides a natural and intuitive framework for the entropic order parameter characterising phases with (partially) broken symmetries. Various examples of group and non-invertible symmetries are studied. In particular, the origin of the distinguishability of the vacua resulting from spontaneously broken non-invertible symmetries is made manifest with an information-theoretic perspective, where certain operators in the SymTFT are excluded from observation.

12.
arXiv (CS.AI) 2026-06-11

DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning

arXiv:2511.02627v3 Announce Type: replace Abstract: We introduce DecompSR, decomposed spatial reasoning, a large benchmark dataset (over 5m datapoints) and generation framework designed to analyse compositional spatial reasoning ability. The generation of DecompSR allows users to independently vary several aspects of compositionality, namely: productivity (reasoning depth), substitutivity (entity and linguistic variability), overgeneralisation (input order, distractors) and systematicity (novel linguistic elements). DecompSR is built procedurally in a manner which makes it is correct by construction, which is independently verified using a symbolic solver to guarantee the correctness of the dataset. DecompSR is comprehensively benchmarked across a host of Large Language Models (LLMs) where we show that LLMs struggle with productive and systematic generalisation in spatial reasoning tasks whereas they are more robust to linguistic variation. DecompSR provides a provably correct and rigorous benchmarking dataset with a novel ability to independently vary the degrees of several key aspects of compositionality, allowing for robust and fine-grained probing of the compositional reasoning abilities of LLMs.

13.
arXiv (quant-ph) 2026-06-16

Trainable Quantum Channels as Computational Primitives for Quantum Learning

arXiv:2606.15808v1 Announce Type: new Abstract: Variational quantum learning is traditionally constrained to unitary dynamics, often treating quantum channels as detrimental noise. In this work, we reformulate the quantum channels as trainable computational primitives and establish a non-unitary quantum machine learning framework grounded in open-system dynamics. We demonstrate that the outputs of channel-enhanced quantum models form a structured superposition of multiple functional components. Each component is governed by an effective observable whose spectrum can be adaptively modulated during training, a significant departure from the spectral invariance in unitary transformations. Moreover, the proposed framework generalizes conventional unitary quantum models by retaining them as a special case while introducing additional non-unitary degrees of freedom. Furthermore, we reveal that trainable quantum channels enrich the optimization geometry through ensemble-averaged gradient and additional optimization directions induced by the Kraus operators. Empirical evaluations on classification tasks using trainable amplitude-damping and phase-damping channels confirm enhanced optimization dynamics and predictive performance. Our work provides a principled approach for leveraging quantum channels as trainable resources and advances the design of high-performance quantum learning architectures.

14.
arXiv (quant-ph) 2026-06-15

A new class of degenerate solutions to the massless Dirac equation and their potential applications in optical memories

arXiv:2606.14256v1 Announce Type: new Abstract: In this article, we present a novel class of degenerate solutions to the massless Dirac equation, corresponding to a wide variety of electromagnetic 4-potentials and fields, including both zero field and circularly polarized electromagnetic waves. An interesting property of these solutions is that the spin of the particles rotates in synchronization with the electric and magnetic fields of the electromagnetic waves. These results could be utilized for the development of optical memories based on materials supporting massless Dirac fermions, such as graphene.

15.
arXiv (CS.CL) 2026-06-16

RASST: Retrieval-Augmented Simultaneous Speech Translation

Simultaneous speech translation produces target text incrementally from partial speech input. Recent speech large language models have markedly improved SST quality but still struggle with rare and domain-specific terminology. Retrieval augmentation has helped in automatic speech recognition and neural machine translation, but extending it to SST is non-trivial: retrieval must be fast and accurate under partial speech, and the model must decide whether and when to apply retrieved terms during incremental generation. We propose Retrieval-Augmented Simultaneous Speech Translation (RASST), which addresses both challenges. For accurate cross-modal retrieval under partial input, RASST trains a lightweight speech-text retriever that produces chunkwise terminology hints for the Speech LLM via multi-scale retrieval. To use these hints correctly, we synthesize training data that teaches the Speech LLM to decide whether and when to apply each retrieved term. Experiments on ACL 60/60 dev set and the ESO test set show that RASST improves terminology accuracy by nearly 40% and overall translation quality by up to 3 BLEU points, with negligible computational overhead.

16.
arXiv (CS.LG) 2026-06-15

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

arXiv:2606.14560v1 Announce Type: cross Abstract: Non-Euclidean optimisation methods with matrix-valued updates, such as Muon and Scion, have recently shown strong empirical performance for training Transformer models, yet their theoretical advantages over Euclidean methods remain poorly understood. We address this gap in the heavy-tailed non-convex regime, where stochastic gradients have bounded $p$-th central moments, $p \in (1,2]$. We show that certain non-Euclidean methods achieve optimal sample complexity under stronger stationarity measures, while Euclidean methods incur additional dimension-dependent costs. As a consequence, for $m \times n$ matrices, Muon finds an $\varepsilon$-stationary point in nuclear norm within $\mathcal{O}\left(\min\{m, n\} \frac{\Delta_1 L}{\varepsilon^2} \left(\frac \sigma \varepsilon \right)^{\frac p {p-1}}\right)$ samples, absorbing heavy-tailed noise without extra dimension dependence, unlike Euclidean methods. We further prove this sample complexity, including its dimension dependence, is optimal for all first-order methods under nuclear-norm stationarity. Experiments on large language models support our theory. Surprisingly, our results suggest that other Schatten geometries beyond the spectral geometry of Muon can perform competitively in certain settings.

17.
arXiv (CS.CV) 2026-06-11

FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection

Transfer learning from large-scale RGB foundation models to infrared (IR) imagery through knowledge distillation (KD) remains challenging due to fundamental differences in image formation physics. We investigate the spectral structure of the RGB–IR modality gap and observe that feature divergence is not uniform across spatial frequencies: low-frequency components (shape, layout) show greater cross-modal alignment than high-frequency components (texture, fine edges), which reflect modality-specific characteristics. Based on this analysis, we propose FreqKD, a frequency-decoupled distillation framework that applies asymmetric supervision adapted to each band's cross-modal consistency. The method employs strict mean squared error (MSE) on the low-frequency band to preserve shared structural information and a relaxed log-MSE loss (weighted at 0.1) on the high-frequency band to provide edge guidance while tolerating texture differences. Spectral divergence analysis on 500 paired samples shows that high-frequency divergence exceeds low-frequency divergence by a factor of 2.4x on average across all analysed transformer layers. On KAIST multispectral pedestrian detection, FreqKD achieves 64.1 mAP50, improving 2.4 points over the DINOv2 baseline. The learned representation transfers across datasets (FLIR ADAS, +2.1 mAP50), tasks (MFNet segmentation, +1.85 mean intersection-over-union), and architectures (ResNet-50, +1.0 mAP50). Code is available at: https://anonymous.4open.science/r/freq_decoupled_kd-5E5A

18.
arXiv (math.PR) 2026-06-16

The optimal sub-Gaussian normalisation for randomised monotone functions

arXiv:2312.01265v5 Announce Type: replace Abstract: Let $\mathcal{M}$ denote the class of randomised monotone functions on $\mathbb{R}$ with values in $[0,1]$, and let $U_{\mathcal{M}}\colon \mathbb{R}_+\to \mathbb{R}_+$ be the minimal function for which $$ \mathbb{P}\left\{ \sqrt{\eta_f}\, \sup_{t\in\mathbb{R}} \left| f_Z(t) - \Exf{f_Z(t)} \right| \ge \varepsilon\sqrt{U_{\mathcal{M}}(\eta_f)} \right\} \le 2\e^{-2\varepsilon^2} $$ holds for every member $f_Z$ of $\mathcal{M}$ with finite effective sample size $\eta_f$ and every positive $\varepsilon$. We prove that for every $x> 1$, $$ \left| \sqrt{U_{\mathcal{M}}(x)} - \sqrt{\log_4 x} \right| \le 2 \min\!\left\{ 1,\, \frac{2 \ln(\e + \ln x)}{\sqrt{\ln x}} \right\}\,. $$ The optimal adjustment $\sqrt{U_{\mathcal{M}}(x)}$ matches $\frac{1}{\sqrt{2\ln 2}}\sqrt{\ln x}$ for all $x>1$, with residuals bounded as above.

19.
arXiv (CS.CL) 2026-06-16

Symbolic Informalization: Fluent, Productive, Multilingual

作者:

Symbolic informalization enables a reliable conversion of formal mathematics to natural language. It has the potential to make machine-checked content human-readable without loss of precision. In a traditional proof system usage, symbolic informalization generalizes the limited mechanisms of syntactic sugar into the ordinary language of mathematics. In a setting where proofs are constructed by artificial intelligence and autoformalization, symbolic informalization can explain what precisely has been constructed. This paper outlines the project Informath, which aims to show how symbolic informalization can produce fluent text with a reasonable development effort and address multiple formal and natural languages. Informath is based on an interlingual architecture, where Dedukti works as a hub between different proof systems (Agda, Lean, Rocq) and Grammatical Framework (GF) takes care of linguistic correctness and variation in different natural languages.

20.
arXiv (CS.LG) 2026-06-16

Early Anomaly-Onset Detection based on Wigner–Ville Distribution Slice Spectra: A Transmission-Grid Test Case

arXiv:2606.15856v1 Announce Type: cross Abstract: Operational disturbance monitoring in power networks requires decisions to be made from waveform windows as they arrive, rather than from completed records after the event. This study evaluates full-vector Wigner–Ville Distribution Slice (WVDS) spectra for sequential anomaly-onset detection in high-voltage grid-voltage waveforms. The approach keeps the bilinear midpoint interaction structure of the Wigner–Ville distribution and represents each 128-sample voltage window by a 128-dimensional slice spectrum, avoiding manually selected fault-frequency markers. WVDS is used with a baseline-normalized deviation (BND) score and is compared against the BND of Fast Fourier Transform (FFT-BND), raw-window autoencoders, FFT autoencoders, and WVDS autoencoders under the same thresholding and three-window persistence rule. A synthetic autoencoder–clustering teacher is used to select RTE fault records that start from an initially normal region and then transition to anomalous behavior. On the filtered test set, FFT-BND achieves the highest sensitivity, whereas WVDS-BND provides the lowest false-alarm operating point, reducing record-level pre-onset false alarms to 0.69%. The autoencoder comparison follows the same selectivity pattern: WVDS reconstruction decreases false alarms relative to FFT reconstruction but misses more examples. The results indicate that preserved WVD cross-term information can form a selective representation for online grid-waveform anomaly monitoring when false alarms are costly.

21.
arXiv (quant-ph) 2026-06-12

Semi-Device-Independent Certification for Nonlocality without Entanglement

arXiv:2606.13667v1 Announce Type: new Abstract: In this work, we investigate maximum-confidence discrimination, which encompasses minimum-error and unambiguous discrimination, for ensembles of separable states by considering global and separable measurements. We demonstrate that global measurements outperform separable ones, thereby establishing nonlocality without entanglement (NLWE) in terms of confidence in a detection event, a fine-grained state-identification strategy that maximizes the probability of a correct guess given a measurement outcome. Conversely, verifying achievable confidence in measurement outcomes can certify global measurements, namely, semi-device-independent certification of NLWE. Our results make it feasible to experimentally demonstrate NLWE using present-day quantum measurement devices, even with non-unit detection efficiencies, since maximum-confidence measurements rely only on detected measurement outcomes.

22.
arXiv (CS.CV) 2026-06-11

SheafStain: Sheaf-Theoretic Schrödinger Bridge for Spatially and Biologically Coherent Virtual Staining

Current virtual staining approaches offer the potential for time- and cost-efficient biomarker quantification in cancer diagnostics and prognostics. However, patch-wise inference for gigapixel whole slide images (WSIs) fails to maintain spatial continuity, yielding artifacts that cause catastrophic mismatches with ground-truth images. Although pathology Vision Foundation Models (VFMs) offer rich representations, their self-attention causes varying global contexts to produce inconsistent embeddings for the same physical region. We formalize and validate this ``context contamination'' as a sheaf-theoretic problem where these embeddings form a presheaf that violates the gluing axiom. To address this, we propose SheafStain, a new approach that reinterprets VFM features as sheaf-like sections for spatially and biologically coherent virtual staining. Specifically, SheafStain integrates class and patch tokens into a Schrödinger Bridge framework as sheaf-like sections. While the class token anchors biological consistency, patch tokens form a per-position spatial map. A backbone co-pretrained on Hematoxylin \& Eosin (H\&E) and Immunohistochemistry (IHC) yields non-degenerate cross-stain stalks, so a single VFM feature space supervises both input conditioning and output stain alignment. Departing from prior work that evaluates on isolated $256 \times 256$ patches and either random-crops or resizes the $1024 \times 1024$ ground truth, we translate at $256 \times 256$ and evaluate on the stitched $1024 \times 1024$ outputs across HER2, ER, PR, and Ki-67. SheafStain demonstrates promising results against six prior methods while mitigating patch-boundary stitching artifacts. Code will soon be released.

23.
arXiv (CS.CL) 2026-06-11

Layer-Isolated Evaluation: Gating the Deterministic Scaffold of a Production LLM Agent with a No-LLM, Regression-Locked Test Harness

End-to-end task-success is the dominant way to evaluate LLM agents, but one aggregate number tells you that an agent regressed, not where. We present layer-isolated evaluation: a deployed ordering agent is decomposed into a fixed taxonomy of layers (ontology, intent, routing, decomposition, escalation, safety, memory, and cross-cutting envelope/defense), each exercised by its own assertion slice in a deterministic, no-LLM "pure" mode. The pure suite (238 cases across 23 slices; 225 run in 2.39 s, ~10 ms/case) runs in CI on every change against a locked per-slice baseline. We validate by controlled regression injection, degrading one layer at a time across seven non-safety layers. The effect we did not design in is masking: the aggregate pass-rate barely moves (-1.7 to -5.9 pp for six local regressions), while the matching slice craters (-25 to -91 pp). A layer's slice reacting to its own fault is partly by construction; the measured results are (i) the aggregate masking and (ii) that damage stays off the other slices: the injected layer's slice is the single worst-hit in 5 of 7 cases and top-3 in 7 of 7 (mean rank 1.29 of 19). Localization replicates on a second, structurally different tenant (Starbucks SG): all seven matching slices crater, so it is not a single-catalog artifact. We position it as a concrete, deterministic instantiation of the component-level evaluation EDDOps prescribes but leaves unimplemented, with CheckList as ancestor and as the deterministic mirror image of whole-workflow stochastic mutation testing. Our contributions: (a) a fully decomposed, sub-second, no-LLM per-layer harness for a production agent, (b) a coverage-honesty test-adequacy criterion that refuses to score an unexercised layer, and (c) the regression-injection demonstration that per-slice baseline-locked gates localize regressions an aggregate metric masks.

24.
arXiv (quant-ph) 2026-06-15

Sensitivity of polaron-molecule observables to MDR/GUP-like ultraviolet deformations at low energies via quantum computing

arXiv:2606.14479v1 Announce Type: new Abstract: We show that impurity many-body observables can display enhanced sensitivity to ultraviolet deformations of generalized-uncertainty-principle and modified-dispersion-relation type at accessible energy scales. Using a deformed polaron-molecule Hamiltonian constructed to preserve the infrared sector, we quantify the impact of such deformations on spectral and Ramsey observables and implement the corresponding dynamics in a controlled quantum computing setting. We identify regimes near the polaron-molecule crossover where small ultraviolet deformations are strongly amplified, leading to experimentally resolvable changes in quasiparticle properties and spectral response. Our results establish a concrete sensitivity-based route to low-energy quantum-gravity phenomenology in a well-defined many-body platform and delimit the validity of the effective description. Furthermore, we report experimental validation on the QRed superconducting quantum processor (BSC-CNS).

25.
arXiv (quant-ph) 2026-06-11

Fisher geometry reshapes the effect of incompatibility in multiparameter quantum estimation

arXiv:2606.11343v1 Announce Type: new Abstract: Multiparameter quantum estimation faces two fundamental obstacles: sloppiness, i.e., anisotropy of the quantum Fisher information matrix (QFIM) that renders some parameter directions insensitive, and incompatibility, the non-commutativity of optimal measurements for different parameters. The trade-off bound $C_T$ captures their joint impact on precision, but it has remained unclear how the distribution of incompatibility across parameter planes affects its overall cost. Here we separate the total amount of incompatibility from its location. We introduce a dimensionless quantity $G_n^{(F)}$ that measures the alignment between the incompatibility distribution and the eigenvalues of the QFIM, and show how the Frobenius scale of the incompatibility contribution factorizes. We obtain a bound and prove the incompatibility cost lies between this bound and a rank-dependent multiple thereof. We also prove that at fixed sloppiness, or equivalently fixed Fisher volume, concentrating incompatibility into a single parameter plane reduces the optimized trade-off cost because the Fisher geometry can then be reshaped to allocate more Fisher area to that plane. A qutrit $SU(2)$ encoding numerically confirms that states with larger incompatibility strength can nevertheless incur a smaller cost if the matching factor $G$ is sufficiently small. Our results establish that the distribution of incompatibility relative to the Fisher eigenbasis is a central diagnostic for multiparameter estimation, beyond the total incompatibility strength.