Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-24

A Private Approximation of the 2nd-Moment Matrix of Any Subsamplable Input

arXiv:2505.14251v2 Announce Type: replace Abstract: We study the problem of differentially private second moment estimation and present a new algorithm that achieve strong privacy-utility trade-offs even for worst-case inputs under subsamplability assumptions on the data. We call an input $(m,\alpha,\beta)$-subsamplable if a random subsample of size $m$ (or larger) preserves w.p $\geq 1-\beta$ the spectral structure of the original second moment matrix up to a multiplicative factor of $1\pm \alpha$. Building upon subsamplability, we give a recursive algorithmic framework similar to Kamath et al 2019, that abides zero-Concentrated Differential Privacy (zCDP) while preserving w.h.p. the accuracy of the second moment estimation upto an arbitrary factor of $(1\pm\gamma)$. We then show how to apply our algorithm to approximate the second moment matrix of a distribution $\mathcal{D}$, even when a noticeable fraction of the input are outliers.

03.
arXiv (CS.LG) 2026-06-15

Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection Bias

arXiv:2606.14506v1 Announce Type: cross Abstract: Understanding how a prediction model will perform in a new environment before deployment is essential to preventing harm when algorithms inform decision-making. Two common sources of model performance degradation are (i) covariate shift, where the target covariate distribution differs from the source, and (ii) selective labels, where the observability of outcomes depends on historical decisions. We study pre-deployment model evaluation under the joint presence of covariate shift and labeling of outcomes selectively based on observed features. In particular, we present a double machine learning procedure for estimating the target risk of an arbitrary black-box prediction model under a general loss function. We show identification of this estimand under standard assumptions and derive a bias-corrected estimator based on the influence function of the target risk. Finally, we evaluate our estimator through experiments using the eICU electronic health records database, showing that it tracks the true target risk more accurately than methods that address either selective labels or covariate shift alone, as well as baselines that combine standard plug-in approaches.

04.
arXiv (CS.AI) 2026-06-12

Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

arXiv:2604.24806v2 Announce Type: replace-cross Abstract: Modern Deep Learning Recommendation Models (DLRMs) follow scaling laws with sequence length, driving the frontier toward ultra-long User Interaction History (UIH). However, the industry-standard "Fat Row" paradigm, which pre-materializes these sequences into every training example, creates a storage and I/O wall where data infrastructure usage exceeds GPU training capacity due to data redundancy that is amplified in multi-tenant environments where models with vastly different sequence length requirements share a union dataset. We present a versioned late materialization paradigm that eliminates this redundancy by storing UIH once in a normalized, immutable tier and reconstructing sequences just-in-time during training via lightweight versioned pointers. The system ensures Online-to-Offline (O2O) consistency through a bifurcated protocol that prevents future leakage across both streaming and batch training, while a read-optimized immutable storage layer provides multi-dimensional projection pushdown for heterogeneous model tenants. Disaggregated data preprocessing with pipelined I/O prefetching and data-affinity optimizations masks the latency of training-time sequence reconstruction, keeping training throughput compute-bound by GPUs. Deployed on production DLRMs, the system reduces training data infrastructure resource usage while enabling aggressive sequence length scaling that delivers significant model quality gains, serving as the foundational data infrastructure for modern recommendation model architectures, including HSTU and ULTRA-HSTU.

05.
arXiv (CS.AI) 2026-06-12

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

arXiv:2605.27628v2 Announce Type: replace Abstract: As autonomous and agentic AI systems scale in robotic and human-machine environments, managing hallucination and persistent but unjustified action remains an open challenge. Rather than attributing these failures solely to model or alignment limitations, this paper explores the architectural vulnerability of unbounded autonomy - the presumption that an agent should continue operating regardless of rising uncertainty. It introduces a theory of managed autonomy that defines intelligent behavior through the formal capacity to detect epistemic drift, suspend reasoning, attempt recovery, and ultimately surrender control when reliability diminishes. We instantiate this theory via the SMARt (Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions) model, a four-layer framework featuring Stable, Meta-cognitive, Assisted, and Regulated states. By developing a timed, guarded Petri net formulation, we establish theoretically bounded properties for the system, demonstrating how architecture can formally mandate escalation, constrain invalid outputs, and ensure governance reachability under specified conditions. We further analyze how incorporating domain-specific trigger sets across varied operational settings (e.g., healthcare, robotics, etc.) can systematically preserve safety, assuming completeness and soundness criteria are met. Because these triggers are designed to be adaptive, the SMARt model accommodates the safe, controlled expansion of an agent's operational scope over time. We conclude that formalizing failure management within the autonomy lifecycle is a crucial step toward realizing reliable and governed artificial intelligence.

06.
arXiv (CS.AI) 2026-06-15

Generative AI for Managerial Decision-Making under Ambiguity and Sycophancy

arXiv:2603.03970v2 Announce Type: replace Abstract: Generative artificial intelligence (GenAI) is increasingly being integrated into complex business workflows, fundamentally shifting the boundaries of managerial decision-making. However, the reliability of its strategic advice in ambiguous business contexts remains a critical knowledge gap. To address this gap, this study compares multiple GenAI models in their ability to detect ambiguity, examines whether a systematic ambiguity-resolution process improves response quality, and investigates their susceptibility to sycophantic behavior when confronted with flawed managerial directives. Using a novel four-dimensional business ambiguity taxonomy, we conducted a human-in-the-loop experiment across strategic, tactical, and operational scenarios. The resulting decisions were assessed through a human-validated automated evaluation framework based on agreement, actionability, justification quality, and constraint adherence. The results show that our approach not only distinguishes different types of ambiguity, but also reveals how ambiguity resolution systematically changes model behavior. In particular, resolving ambiguities improved decision quality across all managerial levels, with the strongest gains observed in constraint adherence. The analysis further showed that sycophantic behavior is not uniform across models: some models challenged flawed assumptions, whereas others tended to comply with them. This study contributes to the bounded rationality literature by positioning GenAI as a cognitive scaffold that can detect and resolve ambiguities managers might overlook, while demonstrating that its artificial limitations require human oversight to ensure its reliability as a strategic partner.

07.
arXiv (math.PR) 2026-06-17

Poisson approximation by coupling

arXiv:2605.01894v2 Announce Type: replace Abstract: It is well known that a binomial $(n,p)$ can be approximated by a Poisson distribution with parameter $np$. The typical approach in undergraduate probability texts is to show a convergence result for the distribution of the binomial as $n$ goes to infinity and $np$ converges to some $\lambda$. In this note we use instead the coupling technique to show a much more general result. Moreover, we only use elementary results from probability.

08.
arXiv (quant-ph) 2026-06-19

Quantum Entanglement Degree, Mean Positronium Lifetime, and the $3\gamma$/$2\gamma$ Annihilation-Rate Ratio as Novel PET Biomarkers for Hypoxia – Concept, Challenges, and Predictions

Authors:

arXiv:2605.00021v3 Announce Type: replace-cross Abstract: This manuscript introduces a novel method to assess tissue oxygen concentration via the quantum entanglement (QE) of photons originating from positronium which is produced within the patient's body during positron emission tomography. We also investigate the possibility of assessing hypoxia by simultaneously detecting positronium lifetime and the positronium decay rate ratio. We introduce two distinct quantum sensing approaches. Method 1 utilizes the correlation between oxygen concentration and ortho-positronium (o-Ps) decay rates, relying on the simultaneous measurement of the mean o-Ps lifetime ($\tau_{\mathrm{oPs}}$) and the $3\gamma$-to-$2\gamma$ annihilation rate ratio of o-Ps ($R_{\mathrm{oPs-3\gamma/2\gamma}}$). Method 2 introduces a novel hypothesis: that the degree of QE is sensitive to the relative contribution of annihilation mechanisms (pick-off vs. conversion), which in turn depends on oxygen concentration. We derive a formula for partial pressure of oxygen ($p\mathrm{O}_2$) as a function of $R_{\mathrm{oPs-3\gamma/2\gamma}}$ and $\tau_{\mathrm{oPs}}$ and estimate the measurement accuracy required for these parameters - and for the degree of QE - to sense in-vivo oxygen pressure in the range between hypoxic and physoxic conditions. Theoretical models and quantitative estimates for $R_{\mathrm{oPs-3\gamma/2\gamma}}$, $\tau_{\mathrm{oPs}}$ and for the degree of QE ($C_{\mathrm{QE}}$ ) as a function of $p\mathrm{O}_2$ are provided for water, isopropanol, cyclohexane, isooctane, and adipose tissue. In particular, applying the formulas derived under the working hypothesis that in pick-off process the photons are not entangled, we estimated that for $p\mathrm{O}_2 = 0$, the degree of quantum entanglement $C_{\mathrm{QE}}$ is equal to 0.890 for adipose, 0.886 for isopropanol, 0.867 for water, 0.818 for cyclohexane, and 0.784 for isooctane.

09.
arXiv (CS.AI) 2026-06-24

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

arXiv:2606.24834v1 Announce Type: new Abstract: LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of these conversations when handling Non-Functional Requirements (NFRs), which are inherently vague, context-dependent, and involve many parts of a program. Evaluating how well these systems support collaborative reasoning about NFRs requires methods that go beyond single-turn accuracy to capture both the correctness of the system's outputs and the quality of the multi-turn interaction. In this paper, we investigate the accuracy and quality of multi-turn conversations between developers and an LLM-based agent in the domain of Health Insurance Portability and Accountability Act (HIPAA) regulatory compliance. We hired 49 programmers to interact with GitHub Copilot to assess 148 HIPAA-derived NFRs against the iTrust codebase, a system designed to comply with HIPAA regulations, across three dimensions: requirement satisfaction level, reasoning, and code localization. We find that developers tend to agree with LLM assessments, but accuracy against expert ground truth is low. We model user satisfaction and find that longer system responses and more information-providing turns negatively affect user satisfaction, whereas proactive interactions positively affect it. Our findings provide insights for designing LLM-based dialogue systems that support NFR assessment.

10.
Nature (Science) 2026-06-17

These ‘master’ proteins protect us from deadly mutations — and could inspire new drugs

Authors:

Biology has clever ways to mask the effects of potentially harmful gene mutations. Scientists are investigating how this ‘buffering’ works — and how to exploit it. Biology has clever ways to mask the effects of potentially harmful gene mutations. Scientists are investigating how this ‘buffering’ works — and how to exploit it.

11.
arXiv (CS.AI) 2026-06-15

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

arXiv:2602.03120v2 Announce Type: replace-cross Abstract: Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and continuous weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient estimation. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision weight updating signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning methods on a variety of tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

12.
arXiv (quant-ph) 2026-06-16

The Quantum Transition State

Authors:

arXiv:2606.10266v2 Announce Type: replace Abstract: The transition state – the critical configuration separating reactants from products – is the central organizing concept of chemical reaction rate theory, yet for nearly a century it has been thought to have no exact quantum counterpart: the recrossing-free, one-way flux through a transition state appears to demand simultaneous knowledge of position and momentum, in conflict with the uncertainty principle. We show this obstruction is illusory and construct the quantum transition state directly from the exact quantum flow. Its stable and unstable invariant manifolds intersect in a unique bounded trajectory – the quantum transition-state trajectory – anchoring a moving dividing surface that each reactive characteristic crosses exactly once, yielding a one-way flux of the standard quantum probability current. The geometric framework underlying classical transition-state theory thus survives intact in exact quantum mechanics, in a fundamentally quantum form.

13.
arXiv (CS.LG) 2026-06-11

Learning Patterns and Abstractions from Perceptual Sequences

Authors:

arXiv:2503.10973v2 Announce Type: replace Abstract: Cognition swiftly breaks high-dimensional sensory streams into familiar parts and uncovers their relations. Why do structures emerge, and how do they enable learning, generalization, and prediction? What computational principles underlie this core aspect of perception and intelligence? A sensory stream, simplified, is a one-dimensional sequence. In learning such sequences, we naturally segment them into parts – a process known as chunking. In the first project, I investigated factors influencing chunking in a serial reaction time task and showed that humans adapt to underlying chunks while balancing speed and accuracy. Building on this, I developed models that learn chunks and parse sequences chunk by chunk. Normatively, I proposed chunking as a rational strategy for discovering recurring patterns and nested hierarchies, enabling efficient sequence factorization. Learned chunks serve as reusable primitives for transfer, composition, and mental simulation – letting the model compose the new from the known. I demonstrated this model's ability to learn hierarchies in single and multi-dimensional sequences and highlighted its utility for unsupervised pattern discovery. The second part moves from concrete to abstract sequences. I taxonomized abstract motifs and examined their role in sequence memory. Behavioral evidence suggests that humans exploit pattern redundancies for compression and transfer. I proposed a non-parametric hierarchical variable model that learns both chunks and abstract variables, uncovering invariant symbolic patterns. I showed its similarity to human learning and compared it to large language models. Taken together, this thesis suggests that chunking and abstraction as simple computational principles enable structured knowledge acquisition in hierarchically organized sequences, from simple to complex, concrete to abstract.

14.
arXiv (CS.LG) 2026-06-16

Taming Curvature: Architecture Warm-Up for Stable Transformer Training

arXiv:2606.16768v1 Announce Type: new Abstract: Training billion-parameter Transformers is often brittle, with transient loss spikes and divergence that waste compute. Even though the recently developed Edge of Stability (EoS) theory provides a powerful tool to understand and control the stability of optimization methods via the (preconditioned) curvature, these curvature-controlling methods are not popular in large-scale Transformer training due to the complexity of curvature estimation. To this end, we first introduce a fast online estimator of the largest (preconditioned) Hessian eigenvalue (i.e., curvature) based on a warm-started variant for power iteration with Hessian-vector products. We show theoretically, and verify empirically, that the proposed method makes per-iteration curvature tracking feasible at billion parameter scale while being more accurate. Using this tool, we find that training instabilities coincide with surges in preconditioned curvature and that curvature grows with depth. Motivated by these observations, we propose architecture warm-up: progressively growing network depth to carefully control the preconditioned Hessian and stabilize training. Experiments on large Transformers validate that our approach enables efficient curvature tracking and reduces instabilities compared to existing state-of-the-art stabilization techniques without slowing down convergence.

15.
arXiv (CS.CV) 2026-06-24

Heterogeneous Knowledge Distillation via Geometry Decoupling and Momentum-Aware Gradient Regulation

Heterogeneous Knowledge Distillation (HKD) aims to transfer knowledge across varying architectures (e.g., from Transformer to CNN) but inherently suffers from severe training instability. We reveal that this instability stems from two highly coupled challenges: massive feature norm discrepancies that cause optimization drag, and severe gradient conflicts between the primary and distillation objectives arising from distinct inductive biases. To achieve stable distillation, we propose SPOFA, a framework built upon a novel Feature and Gradient Dual Stabilization mechanism. Specifically, at the feature level, we introduce a LayerNorm-based decoupling projector that explicitly decouples feature magnitude from direction, creating a bounded and stable space for semantic alignment. At the gradient level, we propose a momentum-driven Exponential Moving Average (MEMA) dynamic scaler. By establishing a robust historical baseline of the optimization trajectory, MEMA actively evaluates instantaneous gradient conflicts and adaptively penalizes harmful distillation signals, guaranteeing stable convergence. Importantly, SPOFA achieves this dual stabilization with an extremely lightweight parameter footprint. Extensive experiments on two mainstream benchmarks demonstrate that SPOFA achieves state-of-the-art accuracy, significantly outperforming computationally expensive methods while introducing only minimal computational overhead compared to standard baselines.

16.
arXiv (CS.CV) 2026-06-12

Perceive, Interact, Reason: Building Tool-Augmented Visual Agents for Spatial Reasoning

While recent vision-language models (VLMs) demonstrate strong multimodal understanding, they remain limited in spatial reasoning tasks that require active evidence acquisition and multi-step visual interaction. This limitation suggests that relying solely on implicit visual representations from vision encoders is insufficient for recovering fine-grained spatial evidence. We introduce PERception-Interaction-reason Agent (PERIA), a tool-augmented visual agent for spatial reasoning tasks across map reasoning, visual probing, and vision reconstruction. PERIA uses two lightweight tool families: vision perception tools for exposing textual, symbolic, and spatial evidence, and vision interaction tools for manipulating visual context, tracing paths, and verifying spatial relations. To train PERIA, we develop a unified recipe that combines supervised tool-use trajectory synthesis, composite rewards, and Observation-Relaxed Group-in-Group Policy Optimization (OR-GIGPO) for effective multi-tool behavior. Experiments on 13 benchmarks from 8 datasets show that PERIA-8B improves over the Qwen3-8B backbone by 10.0% on in-distribution benchmarks and 4.4% on out-of-distribution benchmarks, while outperforming previous state-of-the-art baselines of similar size by 7.0%-14.8%. It also achieves performance comparable to much larger models such as Qwen3-VL-235B-A22B-Thinking and GPT-5, demonstrating the effectiveness of PERIA in enhancing spatial reasoning capabilities.

17.
arXiv (CS.AI) 2026-06-17

First, do NOHARM: towards clinically safe large language models

arXiv:2512.01241v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are routinely used by physicians and patients for medical advice, yet their clinical safety profiles remain poorly characterized. We present NOHARM (Numerous Options Harm Assessment for Risk in Medicine), a 1,100-task benchmark of primary care-to-specialist consultation cases to measure the frequency and severity of harm from LLM-generated medical recommendations. NOHARM covers 10 specialties, with 12,747 expert annotations for 4,249 clinical management options. Across 28 LLMs, recommendations carried the potential for severe harm in up to 22.6% of cases, with errors of omission accounting for more than 80% of severe errors. In a randomized trial of 101 generalist physicians, human benchmark performance significantly improved with AI assistance, yet physicians remained far from realizing the potential of AI tools, frequently ignoring essential advice surfaced by AI. Safety performance tracked general-intelligence and medical-knowledge benchmarks across the full range of models but decoupled at the frontier. Despite strong performance on existing evaluations, widely used AI models can produce medical advice with the potential for severe harm at non-trivial rates, highlighting the importance of explicit measurement of clinical safety.

18.
arXiv (CS.CL) 2026-06-24

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

Transformer-based language models have become the default substrate for natural language processing and the pace of new releases has made it hard for practitioners to separate durable ideas from the noise of incremental announcements. This review works at two levels. At the level of mechanism, we organise the main transformer families into a working taxonomy, covering encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. We then extend the discussion to post-2023 developments that changed the picture in practice: instruction tuning, reinforcement learning from human feedback, direct preference optimisation, mixture-of-experts scaling, retrieval augmentation and the current flagship model families from OpenAI, Anthropic, Google, Meta, Mistral and DeepSeek. At the level of use, we survey deployments across healthcare, finance, legal, education, customer service, creative writing and scientific work. Based on this we link each to the specific capabilities that make a transformer the appropriate tool. The contribution of this paper is a critical assessment that is based on the survey. We compare architectures on four axes that matter to deployment decisions, we quantify the trade-off between parameter count and energy cost. We also discuss how alignment methods, data provenance and benchmark saturation change what it means to call a model "state of the art". The final section lists the research questions that we think deserve more attention.

19.
arXiv (CS.LG) 2026-06-16

HAPI-EP: Towards Hybrid, Adaptive, and Predictive Digital Twins of Cardiac Electrophysiology

arXiv:2606.15637v1 Announce Type: new Abstract: A digital twin (DT) of a patient-specific heart offers significant potential in personalized medicine. However, its rapid and dynamic adaptation to an individual's live data and its predictive capability after adaptation remains central challenges. We examine this challenge from its two building blocks: DT formulation where mechanistic and data-driven models show competing merits and limitations, and DT optimization strategies that are largely driven by a reconstruction objective leading to un-identifiable models. We address both bottlenecks via HAPI – an AI framework for building hybrid, adaptive, and predictive DTs with three key enablers. First, HAPI constructs a physics-integrated gray-box model in which an interpretable mechanistic backbone is augmented by a neural component that models its residual to the observed data. Second, rather than attempting to pre-encode all possible variations in a static hybrid model, HAPI enables rapid on-the-fly adaptation of the hybrid model to few-shot live data, achieved by feedforward meta-learners realizing amortized inference of both mechanistic and neural parameters of the hybrid model trained with predictive objectives. Finally, we show that this adaptivity corresponds to the construction of a conditional generative model (i.e., the hybrid DT) that endows it with theoretical identifiability and thus strong performance in predictive scenarios. We demonstrate the proof-of-concept of HAPI in cardiac electrophysiology using a hybrid monodomain model with mechanistic reaction kinetics and neural graph diffusion. Across synthetic and real-data studies, we show that HAPI's mechanistic-neural hybridization and predictive adaptation are critical for obtaining identifiable DTs with strong predictive and out-of-distribution capabilities.

20.
arXiv (CS.LG) 2026-06-12

The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

arXiv:2606.13191v1 Announce Type: new Abstract: Continuous-state generative samplers, including diffusion and flow-matching models, evolve through continuous reverse-time dynamics, yet their samples often undergo abrupt qualitative changes: trajectories commit to modes, semantic alternatives collapse, and small perturbations in narrow time windows can produce large downstream effects. This paper develops a geometric account of such phase-transition-like behaviour. We view denoising as gradient descent on a free energy landscape and show that sharp transitions arise near projection caustics, where the nearest-point projection onto the data support ceases to be unique. Motivated by this perspective, we introduce the Critical Boundary Detector (CBD), as practical diagnostics for score-direction instability. Across toy models, standard diffusion models, and latent text-to-image diffusion models, CBD localises mode commitment, predicts intervention-sensitive windows, and supports targeted control in geometrically sensitive regions. Our results connect geometry of data and dynamics of diffusion generation.

21.
arXiv (CS.CV) 2026-06-24

Automated Residual Plot Assessment With the R Package autovi and the Shiny Application autovi.web

Visual assessment of residual plots is a common approach for diagnosing linear models, but it relies on manual evaluation, which does not scale well and can lead to inconsistent decisions across analysts. The lineup protocol, which embeds the observed plot among null plots, can reduce subjectivity but requires even more human effort. In today's data-driven world, such tasks are well suited for automation. We present a new R package that uses a computer vision model to automate the evaluation of residual plots. An accompanying Shiny application is provided for ease of use. Given a sample of residuals, the model predicts a visual signal strength (VSS) and offers supporting information to help analysts assess model fit.

23.
arXiv (CS.LG) 2026-06-19

Semantic-Anchored Evidential Fusion for Domain-Robust Whole-Slide Survival Analysis

arXiv:2606.19966v1 Announce Type: cross Abstract: Whole-slide images (WSIs) are widely used for computational cancer prognosis. However, most existing methods primarily focus on in-domain performance and fail to generalize across clinical centers. This limitation stems from their reliance on pixel-derived representations that are highly susceptible to domain-specific artifacts caused by staining protocols and scanner hardware. We hypothesize that high-level pathology semantics, such as tumor grade and micro-environmental architecture, provide a domain-invariant semantic representation that mirrors the robust diagnostic logic of human pathologists. Therefore, we propose a Semantic-Anchored Evidential Fusion Survival (SAEFS) framework, where SAEFS derives semantic anchors from WSIs via Visual Question Answering (VQA), employs a dual-stream WSI evidence extraction architecture, uses Dirichlet-based Subjective Logic to model uncertainty, and fuses semantic and visual evidence through a cautious conjunction rule to avoid overconfident fusion from correlated sources. Trained exclusively on one source domain and evaluated zero-shot across four unseen domains, SAEFS consistently outperforms state-of-the-art models both in prediction accuracy and reliability, improving the average C-index by 10.2%. Quantitative analyses further show that VQA-derived semantic features exhibit significantly lower cross-center divergence than pixel-derived features, highlighting their robustness for cross-center clinical applications.

24.
arXiv (CS.CL) 2026-06-12

X-MADAM-RAG: Diagnosing and Handling Chinese-English Evidence Conflict in Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) systems may receive evidence that is not merely noisy but mutually contradictory. This issue becomes particularly salient in multilingual settings, where retrieved Chinese and English evidence may support incompatible answer candidates. We study this problem through X-RAMDocs-ZHEN, a controlled Chinese-English benchmark derived from RAMDocs for diagnosing evidence conflict in RAG. The benchmark contains 300 examples across six balanced conditions, including monolingual support, bilingual agreement, reversed conflict directions, and conflict with optional noise. We further examine X-MADAM-RAG, an interpretable pipeline that decomposes evidence handling into per-document candidate extraction, visible-evidence repair, deterministic candidate grouping, and conflict-aware aggregation. On the original controlled benchmark with Qwen2.5-7B-Instruct, X-MADAM-RAG achieves 0.9667 strict accuracy and 0.9767 conflict-aware success, outperforming an evidence-normalized single-call baseline. However, a zero-call rule-only extractor reaches 1.0000 on the same benchmark, revealing strong template regularity. To probe this limitation, we construct a deterministic naturalized stress test that removes explicit answer templates while preserving candidate strings. On its 100-sample subset, rule-only extraction falls to 0.0000, but X-MADAM-RAG also drops to 0.3000 strict accuracy, below both naive and evidence-normalized baselines. A privileged oracle remains perfect, indicating that document-level extraction is the main bottleneck. These findings position X-RAMDocs-ZHEN and X-MADAM-RAG as diagnostic tools for controlled evidence conflict rather than as evidence of general hallucination detection or robustness to natural retrieval.

25.
arXiv (quant-ph) 2026-06-24

Preparing multi-qudit states in a definite-weight subspace

arXiv:2606.24659v1 Announce Type: new Abstract: We formulate a deterministic algorithm for preparing arbitrary multi-qudit states in a definite-weight subspace. By ordering the corresponding computational basis states according to a Gray code for multiset permutations, the state-preparation task is reduced to performing a sequence of controlled 2-qudit Gray rotations. We use this algorithm to prepare exact eigenstates of the SU(3)-invariant Heisenberg Hamiltonian, whose Bethe ansatz is nested. In particular, we describe the preparation of the Bethe states, which are SU(3) highest-weight states, as well as their lower-weight descendants. We also consider the preparation of $SU(d)$ Dicke states and their q-deformations.