Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-24

Evaluating the Interpretability of Sparse Autoencoders with Concept Annotations

arXiv:2606.24716v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are increasingly used to extract interpretable concepts from vision and vision language models, yet existing evaluation methods largely rely on proxy metrics or qualitative inspection rather than measuring semantic correspondence. We present a human-grounded evaluation framework that quantifies alignment between SAE latents and human-annotated concepts, without requiring user studies, and validate this matching through targeted attribute perturbations. To enable this intervention-style evaluation in vision, we construct synCUB and synCOCO, synthetic benchmarks of paired images that differ in exactly one attribute. We introduce Fully-Binary Matching Pursuit (FBMP), a coalition-based matching procedure that supports many-to-one mappings between SAE latents and annotated concepts, and consistently outperforms one-to-one baselines. For functional validation, we propose a Targeted Attribute Perturbation Alignment Score (TAPAScore), which tests whether matched concepts respond selectively and in the expected direction under targeted image-level attribute perturbations. Under sanity checks, our matching and TAPAScore are the only evaluated metrics that reliably distinguish trained SAEs from untrained ones. Across SAEs trained on CLIP and DINOv2 embeddings, we find that increased overcompleteness can reduce perturbation alignment, indicating a reduction in interpretability. Our evaluation framework suggests that moderate dictionary sizes provide the best trade-off, yielding the most interpretable SAEs. Code and datasets are available at https://github.com/JonasKlotz/sae-concept-eval.

02.
arXiv (quant-ph) 2026-06-15

Tensor network manifolds and Riemannian fundamental theorem for tensor networks

arXiv:2606.14613v1 Announce Type: cross Abstract: Tensor networks provide a powerful framework for efficiently representing high-dimensional data and many-body quantum states. Endowing tensor networks with a Riemannian manifold structure provides a natural setting for numerical optimization and analysis. A central feature of tensor networks is their gauge freedom, whose characterisation (captured by so-called fundamental theorems) underlies both their intrinsic structure and the design of numerical algorithms. In this work, we study the interaction between the Riemannian manifold structure and the gauge freedom for several families of tensor networks. Using group actions and Riemannian submersions, we establish a Riemannian fundamental theorem for the tensor network families studied.

03.
arXiv (quant-ph) 2026-06-16

Optimising Entanglement Distillation Policies

arXiv:2606.14908v1 Announce Type: new Abstract: Entanglement distillation is a fundamental operation in quantum information processing used to obtain higher-fidelity entangled pairs from a supply of less entangled quantum states using local operations aided by classical communication (LOCC). In a physically relevant setting, where states with an initial fidelity of $f_0$, probabilistically generated over multiple, $m$, memory pairs distributed between two parties, Alice and Bob, are pairwise distilled, the optimal policy identifies the system-configuration dependent sequence of entanglement generation and distillation operations that need to be performed in order to minimize the expected time to reach some target fidelity $f_T>f_0$. Here, we formulate and systematically analyze this task as a Markov decision problem and using a value iteration algorithm, obtain optimal deterministic policies that minimize the expected waiting time required to reach a target fidelity. Our results show that the expected waiting time under the optimal policy decreases with increasing generation probability $p$ and number of quantum memories $m$ - as expected. In contrast, it exhibits non-monotonic behavior with respect to $f_0$ for a fixed fidelity gap, $(\Delta f = f_T-f_0)$. While the optimal policy consistently outperforms baseline policies such as the greedy, nested and entanglement pumping policies, its relative advantage is regime-dependent, being determined by the system parameters ($p,f_0,f_T,m$), and exhibits a nontrivial dependence on the fidelity gap $\Delta f$. Our results highlight the value of formulating entanglement distillation as a Markov decision problem, enabling the systematic design of policies that achieve target fidelity thresholds for quantum information tasks in realistic resource-constrained settings.

04.
arXiv (CS.AI) 2026-06-16

Direction-Conditioned Policies via Compositional Subgoal Scoring for Online Goal-Conditioned Reinforcement Learning

arXiv:2606.16515v1 Announce Type: cross Abstract: Hamilton-Jacobi-Bellman theory implies that the optimal goal-conditioned action depends on the goal only through the gradient of the goal-reaching distance at the current state, yet standard online GCRL still conditions the actor on the raw goal – a signal that is geometrically uninformative when the goal is far from the data distribution. We propose Direction-Conditioned Policies (DCP), a fully online method that decomposes goal-reaching into two components sharing one InfoNCE representation $\psi$: a subgoal-scoring step that selects a visited state $z_t$ aligned with the final goal $g$ in $\psi_g$, and a direction-conditioned actor that consumes the unit direction $d_t$ and magnitude $r_t$ from $\psi(s_t)$ to $\psi(z_t)$. The two components train jointly, factor cleanly at deployment (subgoal scoring is removed, while direction conditioning remains with $g$ in place of $z_t$), and admit independent modification at the same $(d_t,r_t)$ interface. We prove three results. First, direction sufficiency under HJB: the optimal action under control-affine dynamics depends on the goal only through the value gradient. Second, a quantitative bound showing that, under mild conditions on the learned representation and assuming the scoring rule returns an on-path $z_t$, the actor's conditioning input at training and at deployment coincide up to representation error and geodesic slack. Third, a controllable-subspace characterization of when directional conditioning fails. Across nine environments, DCP improves over Contrastive RL on most final metrics, with the largest gains on manipulation and obstacle-interaction tasks; a qualitative analysis of the learned $\psi$-distance landscape shows the contrastive representation behaves as an online quasimetric encoding environment topology, and the single failure case (AntSoccer) localizes to a learned-gradient pathology that the theory anticipates.

05.
arXiv (CS.LG) 2026-06-24

Relatively Smart: A New Approach for Instance-Optimal Learning

arXiv:2603.01346v2 Announce Type: replace Abstract: We revisit the framework of Smart PAC learning, which seeks supervised learners which compete with semi-supervised learners that are provided full knowledge of the marginal distribution on unlabeled data. Prior work has shown that such marginal-by-marginal guarantees are possible for "most" marginals, with respect to an arbitrary fixed and known measure, but not more generally. We discover that this failure can be attributed to an "indistinguishability" phenomenon: There are marginals which cannot be statistically distinguished from other marginals that require different learning approaches. In such settings, semi-supervised learning cannot certify its guarantees from unlabeled data, rendering them arguably non-actionable. We propose relatively smart learning, a new framework which demands that a supervised learner compete only with the best "certifiable" semi-supervised guarantee. We show that such modest relaxation suffices to bypass the impossibility results from prior work. In the distribution-free setting, we show that the One-Inclusion Graph learner is relatively smart up to squaring the sample complexity, and show that no supervised learning algorithm can do better. For distribution-family settings, we show that relatively smart learning can be impossible or can require idiosyncratic learning approaches, and its difficulty can be non-monotone in the inclusion order on distribution families.

06.
arXiv (CS.CV) 2026-06-11

From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning

Visual causal reasoning is essential for understanding and intervening in the physical world, requiring identification of causal variables from visual inputs and reasoning over intervention effects. Despite recent progress, large vision–language models (VLMs) remain brittle at such tasks, especially for interventional and counterfactual queries over multi-image inputs. Most existing explorations inject causal knowledge via textual prompts, leaving causal mechanisms external to model execution and limiting reliable control during inference. To address this problem, we propose BridgeVLM, which internalizes visual causal reasoning by inducing a causal graph from multi-image inputs and converting it into structured Causal Tokens executed by RAMP layers injected into the LLM decoder for causal message passing. We further introduce a unified training interface M3S for fine-grained causal supervision from different granularities (local/global level). BridgeVLM achieves 54.4% accuracy on intervention tasks on CausalVLBench (vs. 33.2% with prompt-level supervision), improves results on Causal3D from 43.6% to 49.0%, and substantially improves causal structure learning on CausalVLBench ($F_1$: 33.4% $\rightarrow$ 75.1%).

07.
arXiv (CS.LG) 2026-06-24

WiFi-Based People Counting Using Beam-Steerable Antennas: A Test-bed Study

arXiv:2606.23710v1 Announce Type: cross Abstract: Ubiquitous perception through RF signals is a pivotal opportunity for future technology: it enables personalized services such as smart living, remote healthcare, automated logistics or interaction through free-space gestures. The ubiquity of Wi-Fi and cellular networks presents a promising platform for the development of innovative sensing tools. Future standards will also introduce dedicated sensing features which, for example, will allow routers to work as frequency modulated continuous wave radios targeting radar applications. Most of the current chip designs support ad-hoc firmware for CSI extraction with MIMO arrangements of the transmitter (TX) and receiver (RX) antennas and OFDM subcarriers. The CSI describes the phase shift and amplitude attenuation of multiple propagation paths on each subcarrier. The latest IEEE 802.11be standard (Wi-Fi 7) offers a wider subcarrier bandwidth of 160MHz (up to 320MHz), providing at least 120 usable pilot subcarriers for CSI or CIR estimation. Additionally, Wi-Fi signals have been recently exploited to track daily human movements and behaviors, while Wi-Fi signal variations have been shown to differ between different people and can consequently be used for their re-identification.

08.
arXiv (CS.AI) 2026-06-16

AI Contagion in Social Networks

arXiv:2606.15206v1 Announce Type: cross Abstract: We study how artificial intelligence (AI) interacts with social communication networks to shape the stability of collective knowledge. Agents exchange information through a network while receiving AI-generated content, and AI systems retrain on the aggregate social information they influence. This interaction generates two feedback forces: an AI contagion channel, through which distortions diffuse across the network, and an AI social distortion multiplier, through which retraining amplifies past errors. Despite the high dimensionality of the environment, we show that the long-run behavior of the system admits a two-dimensional representation whose spectral radius determines whether AI-mediated information systems are dynamically stable or unstable. We characterize a sharp regulatory frontier identifying the minimum filtering required for stability and show how network topology shapes systemic informational risk.

09.
arXiv (CS.CV) 2026-06-16

Variational Test-time Optimization for Diffusion Synchronization

Collaborative generation, which coordinates multiple diffusion trajectories to extend the capabilities of pretrained priors, has emerged as a powerful paradigm for extending the applicability of diffusion models. Among existing approaches, diffusion synchronization provides a scenario-agnostic solution by introducing general guidance mechanisms. However, current synchronization approaches rely heavily on heuristics and still require task-specific tailoring, which limits their generalizability and performance. In this work, we mathematically derive a synchronization framework based on optimal control, providing a principled explanation of diffusion synchronization. During sampling, we optimize control variables to guide multiple trajectories toward coherent solutions while remaining close to the underlying diffusion prior. Our method operates entirely at test-time without additional training, thereby enabling broad applicability across diverse generation scenarios when combined with strong pretrained priors. We demonstrate consistent improvements over baselines on three representative collaborative generation tasks, covering a wide range of modalities and applications. Beyond performance gains, our work establishes a novel foundation for collaborative generation, opening a principled path toward extending pretrained generative models to new collaborative generation settings.

10.
arXiv (CS.AI) 2026-06-16

Exploring Starts Are Not Enough: Counterexamples and a Fix for Monte Carlo Exploring Starts

arXiv:2606.15247v1 Announce Type: cross Abstract: The asymptotic behaviour of Monte Carlo Exploring Starts (MCES) is a long-standing open question in reinforcement learning, even in the tabular setting. We investigated the convergence properties of tabular MCES by constructing examples in which the algorithm converges to suboptimal solutions. This paper presents new counterexamples for both initial-visit and first-visit MCES and gives a convergence-restoring modification for the initial-visit case. We show that stable suboptimal solutions may exist for initial-visit MCES with sample-average updates even when greedy actions are updated more often than non-greedy actions on average. However, by scaling learning rates inversely to update frequencies on a state-by-state basis, convergence to optimality is guaranteed. Unlike previous uniformisation methods, this modification is applicable to large-scale problems that require approximating the estimated value function. We then extend the example to show that sample-average first-visit MCES may also converge to suboptimal solutions. This largely settles a fundamental open problem and shows that exploring starts alone do not guarantee convergence to optimality. More broadly, these results highlight that convergence depends critically on the relative size and frequency of updates applied to different actions, making the choice of learning rates and the balance between exploration and exploitation central to the analysis of MCES and the implementation of scalable Monte Carlo control methods.

11.
bioRxiv (Bioinfo) 2026-06-17

DNA-binding specificity recognition from predicted homologous protein-DNA structures

Predicting protein DNA-binding specificity is essential for understanding gene regulation and disease mechanisms. Existing deep learning methods typically infer specificity from a single protein-DNA complex structure, which limits their ability to capture the diverse geometric patterns underlying protein-DNA recognition. Homologous protein-DNA interfaces provide complementary structural evidence and richer geometric features related to interatomic interactions. To address the limited diversity and coverage of experimentally determined complexes, we constructed a large-scale library of predicted homologous protein-DNA complex structures. Building on this resource, we propose HomoDSP, a template-retrieval-based framework for accurate DNA-binding specificity prediction. Benchmark evaluations and validation on newly released JASPAR 2026 samples indicate that HomoDSP outperforms existing methods in both accuracy and generalization, with particularly substantial gains on high-error samples. Moreover, this performance is largely retained when AlphaFold3-predicted complex structures are used as input. Template- and residue-level interpretability analyses suggest that HomoDSP improves prediction by focusing on DNA-affinity residues across multiple homologous templates. Finally, universal Protein Binding Microarrays evaluations on AI-designed DNA-binding proteins show that HomoDSP rescues a baseline failure mode in which the baseline method produces incorrect predictions because of training-set bias. Together, these results support the use of homologous template interfaces as informative structural priors for decoding protein DNA-binding specificity.

12.
arXiv (CS.AI) 2026-06-15

When Should Agent Trust Be Conditional? Characterizing and Attacking Skill-Conditional Reputation in Agent Swarms

arXiv:2606.14200v1 Announce Type: new Abstract: Open platforms increasingly route tasks among heterogeneous LLM agents–differing in base model, scaffold, and tool stack–whose competence varies sharply by skill: an agent excellent at one skill may be useless at another. The standard reputation approach summarizes each agent by a single global trust score, but that scalar is the wrong object here, because routing every task to the globally most-trusted agent leaves the value of specialization unclaimed. We study skill-conditional trust R(i | k)–the trust to place in agent i for a task requiring skill k, rather than one score per agent–and pose three falsifiable questions: when is conditioning worth it, how much cross-skill evidence should be borrowed, and whether that borrowing is safe. A controlled phase-diagram analysis answers the first two: conditional trust wins only in a specific regime–high agent heterogeneity, sparse per-skill evidence, and correlated skills–and the coupling strength beta that buys this data efficiency is dual-use, because the same cross-skill borrowing is also a laundering channel. On a public benchmark of 14 genuinely heterogeneous AppWorld agents, real pools land inside the beneficial regime–a small but genuine gain, with the per-skill best agent genuinely changing across skills. We then show that an attacker with cheap evidence in one skill and none in a target skill hijacks the conditional router, driving routing regret from 0 to 0.94 on a pool our zero-cost Conditional Information Value Test (CIVT) rates GREEN–while the ungated trust verdict it contaminates reads -0.06 instead of the honest +0.19. A zero-evidence gate bounds the attack but does not eliminate it; we characterize the residual cost under an explicit budget. We do not claim Sybil-resistance–we quantify the trade-off.

13.
medRxiv (Medicine) 2026-06-11

Ferritin across long-term conditions in England: cross-sectional primary care study

Background Iron deficiency (ID) is a readily treatable condition once identified. Ferritin is the primary diagnostic marker, but cut-offs vary and inflammation complicates interpretation in patients with long-term conditions (LTCs). Aim To describe ferritin distribution and the prevalence of threshold-defined low ferritin in adults with and without LTCs in primary care. Design and setting Cross-sectional observational study using routinely collected electronic health records from a national primary care database in England (1st January 2015 to 31st December 2021). Method Adults with >1 ferritin test in Clinical Practice Research Datalink (CPRD) Aurum were included. LTCs were identified using validated primary-care code lists. Outcomes included ferritin distribution and threshold-defined ID prevalence using World Health Organization (WHO) (

14.
arXiv (CS.AI) 2026-06-15

CARE: Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation

arXiv:2606.14581v1 Announce Type: cross Abstract: Granting LLMs direct control over costly, irreversible scientific experiments leads to unsafe exploration and unstable performance, but discarding LLM creativity entirely sacrifices significant optimization potential. We introduce CARE (Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation), an auditable controller for high-throughput experimentation (HTE) optimization that keeps a non-LLM incumbent optimizer as the default action path while using LLMs to revise challenger ranking policies. Before each outcome is revealed, a public-evidence intervention gate compares the challenger with the incumbent. It authorizes the challenger's selection only when the evidence available before selection supports the change, with the decision recorded in the audit log. CARE outperforms all other evaluated methods on Minerva/Olympus and ChemLex benchmarks, with final-best improving from 80.0 to 88.5 on Minerva/Olympus and from 83.9 to 92.1 on ChemLex, relative to the public incumbent. Our experiments indicate that LLM self-evolution is more reliable when it expands the proposal space under an auditable controller, rather than directly choosing experiments.

15.
arXiv (CS.CL) 2026-06-16

Towards Pareto-Optimal Tool-Integrated Agents with Pareto Ranking Policy Optimization

Recent advances in tool-integrated language agents have significantly improved their ability to solve complex reasoning tasks. However, existing alignment methods predominantly focus on maximizing task accuracy, while overlooking auxiliary objectives such as tool-use efficiency, which are essential for practical deployment. To address this gap, we introduce ParetoPO, a two-stage multi-objective optimization framework for aligning tool-using large language models (LLMs) under competing objectives. In the first stage, ParetoPO leverages hypervolume-guided dynamic scalarization to adapt reward weights based on global Pareto frontier progress. In the second stage, it replaces scalarized learning signals with Pareto-ranking-based advantage computation, promoting nondominated trajectories through dominance-aware credit assignment. This design enables fine-grained, action-level optimization across multiple conflicting objectives. Experimental results on mathematic reasoning and multi-hop QA tasks show that ParetoPO consistently discovers policies with superior accuracy-efficiency trade-offs compared to static and heuristic baselines.

16.
arXiv (CS.AI) 2026-06-18

Bounded Context Management for Tabular Foundation Models on Stream Learning

arXiv:2606.18677v1 Announce Type: cross Abstract: Tabular stream learning requires predictions on sequentially arriving examples under distribution shift. While standard methods adapt by updating model states, tabular foundation models (TFMs) make predictions conditioned on a labeled context in an in-context manner, making them a natural alternative for stream learning. This shifts the challenge from how to update the model to how to manage the context. We propose a future information view that yields three practical requirements for context management: preserve recent examples, retain uncertain examples, and remove redundant examples. We instantiate these requirements as CURE (Context management via Uncertainty-aware admission and Redundancy aware Eviction), a context-managing policy with entropy-gated admission and redundancy-aware eviction. Across seven streams, CURE shows up to 27.0% relative improvement over classical stream learners, remains robust across multiple TFM backbones, and ranks first among other policy variants. Code and datasets are available at https://github.com/morcellinus/CURE-ICML-FMSD.

17.
arXiv (math.PR) 2026-06-16

Quantitative Oppenheim Conjecture for Random Quadratic Forms and Optimal Variance Bounds in Function Fields

arXiv:2606.16699v1 Announce Type: cross Abstract: We prove a quantitative version of Oppenheim's conjecture in the function field setting. In order to do so, we compute the higher moments of the Siegel transform. In particular, we find an optimal bound on the variance of the number of lattice points in a set. Moreover, we compute the exact variance of the number of lattice points in a ball, which is of independent interest.

18.
arXiv (CS.CL) 2026-06-12

Polar: A Benchmark for Evaluating Political Bias in LLMs

Political bias in large language models (LLMs) is increasingly significant, but difficult to measure reproducibly across political and linguistic contexts. We introduce Polar, a 4,026-instance multiple-choice benchmark that measures political bias through option-level likelihoods rather than prompt-based generation. Polar covers two ideological axes and eight issue categories derived from the Manifesto Project, and evaluates models in parallel across U.S. and South Korean political contexts. Across 38 LLMs, measured bias varies systematically with political context, issue category, model group, and presentation language. All models lean left-progressive on U.S. political content, but show more centered and mixed patterns on South Korean content. Translation experiments further show that presentation language alone can shift measured bias. These findings highlight the need for multilingual and cross-contextual evaluation of political bias in LLMs.

19.
arXiv (CS.CV) 2026-06-16

Dual-branch Prompting for Multimodal Machine Translation

Multimodal Machine Translation (MMT) typically enhances text-only translation by incorporating aligned visual features. Despite the remarkable progress, state-of-the-art MMT approaches often rely on paired image-text inputs at inference and are sensitive to irrelevant visual noise, which limits their robustness and practical applicability. To address these issues, we propose D2P-MMT, a diffusion-based dual-branch prompting framework for robust vision-guided translation. Specifically, D2P-MMT requires only the source text and a reconstructed image generated by a pre-trained diffusion model, which naturally filters out distracting visual details while preserving semantic cues. During training, the model jointly learns from both authentic and reconstructed images using a dual-branch prompting strategy, encouraging rich cross-modal interactions. To bridge the modality gap and mitigate training-inference discrepancies, we introduce a distributional alignment loss that enforces consistency between the output distributions of the two branches. Extensive experiments on the Multi30K dataset demonstrate that D2P-MMT achieves superior translation performance compared to existing state-of-the-art approaches. Our code is publicly available at https://github.com/MentaY/DDP.

20.
arXiv (quant-ph) 2026-06-16

Quantum Illumination with Symmetry-Constrained Random Unitaries

arXiv:2606.15586v1 Announce Type: new Abstract: Quantum illumination provides a quantum advantage in detecting weakly reflecting objects embedded in a noisy environment, even when environmental noise destroys most of the initial entanglement. We investigate this advantage using Haar-random probe states constrained to symmetry-resolved subspaces. Employing tools from quantum channel discrimination and asymptotic hypothesis testing, we derive the discrimination exponents associated with Haar-random probe ensembles and identify the role of symmetry in determining their performance. We show that typical states drawn from fixed-charge sectors achieve the same asymptotic quantum-illumination advantage as maximally entangled probes. In particular, we show that the effective thermal-noise suppression and the corresponding Chernoff exponent are governed by the dimension of the accessible symmetry sector. Our results reveal that the operational resource underlying quantum illumination can be generalized from fine-tuned structure of a specific probe state to the existence of a large symmetry-protected correlation subspace. These findings establish a direct connection between quantum illumination, symmetry-resolved typicality, and quantum channel discrimination, and demonstrate that near-optimal quantum hypothesis testing resources can emerge naturally from generic many-body quantum states constrained by conservation laws.

21.
arXiv (CS.AI) 2026-06-24

Tractable Reasoning and Conjunctive Query Answering for Defeasible DL-Lite under Rational Closure

arXiv:2606.24279v1 Announce Type: new Abstract: In Description Logics (DLs), reasoning under Rational Closure (RC) is a well-known and widely accepted non-monotonic formalism to handle defeasible knowledge. In this paper, we study the application of RC to the core and horn variants of the DL-Lite family of lightweight description logics. We analyze both entitlement (instance checking) and Conjunctive Query (CQ) answering under RC. Our main contribution is providing a plug-in architecture that builds upon existing standard classical reasoners, establishing that reasoning and CQ answering under RC for DL-Lite can be done efficiently with minimal computational overhead.

22.
arXiv (quant-ph) 2026-06-15

Resolving the Edge of a Quantum Pyramid

arXiv:2606.14698v1 Announce Type: new Abstract: Standing on the shoulders of giants, we resolve the quantum pyramids conjecture, confirming the globally information-optimal measurement for an ensemble of equiangular equiprobable pure states, as conjectured by Englert and \v{R}eháček (arXiv:0905.0510). We do so by proving the remaining entropy inequalities of Holevo and Utkin (arXiv:2506.06700), which certify optimality for obtuse and flat pyramids. For obtuse pyramids, our key contribution is a rigorous proof that local minimizers of the corresponding entropy inequality cannot have three distinct coordinate values. We show that eliminating this family can be reduced to a neat algebraic reciprocal inequality relating branches of the Lambert $W$ function, which may be of independent interest. For flat pyramids, we prove a tight $\ell^p$ inequality for zero-sum vectors that was recently conjectured, proved analytically in dimension $d=3$, and computationally verified for $d\leq 200$ by Holevo and Utkin (arXiv:2603.24017). We prove this bound for all $d\geq 2$ via a technique in symmetric inequalities known as the equal variables method.

23.
medRxiv (Medicine) 2026-06-22

A blinded, counterbalanced rater design for evaluating AI-assisted summarisation of tertiary clinical genomics reports: methodology of the QNOMX-VHIR-CPSP-001 Phase 1 study

Background. Tertiary clinical genomics reports condense layered molecular findings into documents that treating oncologists must read, translate, and act upon; manual summarisation of these reports is time-consuming and variable. Tools that assist summarisation and translation into local languages are emerging, yet the field lacks an agreed methodology for evaluating such tools before any downstream clinical use. The appropriate first endpoint is fidelity of the generated summary to its source report, assessed by qualified human raters under blinded scoring, not downstream variant classification. Methods. QNOMX-VHIR-CPSP-001 Phase 1 is a single-site, non-interventional clinical performance study conducted at Vall d'Hebron Institut de Recerca (VHIR) under ISO 20916:2019 as a Clinical Performance Study Protocol. De-identified tertiary cancer genomics reports from pediatric oncology cases are summarised by the AI-assisted summarisation system under evaluation and, in parallel, by the standard manual workflow. Qualified raters score both summary types against the source genomics report using the Quality Summary Index (QSI), a six-dimension, five-point rubric adapted from the Provider Documentation Summarization Quality Instrument, under a blinded, counterbalanced, two-period crossover with a minimum fourteen-day washout. Two co-primary composite endpoints, content and presentation, are analysed for non-inferiority under a Bayesian hierarchical model, with a frequentist linear mixed model as the convergence check. Inter-rater reliability is reported as Krippendorff's ; a Monte-Carlo power analysis of the fixed clustered design is pre-specified. Discussion. The design isolates summarisation quality from clinical decision-making by scoring both summary types against the same source report under blinding, counterbalancing, and a fourteen-day washout. Conclusion. The QSI rubric, the counterbalanced crossover, and the pre-specified Bayesian primary with frequentist convergence check define a replicable protocol for early-stage evaluation of AI-assisted summarisation in tertiary genomics reporting; observed variance components will inform sample-size determination for Phase 2.

24.
arXiv (CS.CL) 2026-06-18

MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval

Retrieval-augmented generation (RAG) systems depend critically on how documents are chunked and searched. Fine-grained chunks can improve retrieval precision but expand the search space, increasing latency and cost; larger chunks reduce the number of candidates but make dense similarity less reliable, as the representation for each chunk mixes multiple topics and introduces more semantic noise. This trade-off becomes especially limiting in deep research tasks, where retrieval must be both fast and precise across large, heterogeneous corpora. We introduce MCompassRAG, a metadata-guided retrieval framework that uses topic-level signals as a semantic compass for selecting relevant evidence. Instead of relying only on cosine similarity between queries and noisy chunk embeddings, MCompassRAG enriches chunk representations with topic metadata in the same embedding space and trains a lightweight retriever through LLM-teacher distillation. At inference time, MCompassRAG performs topic-aware retrieval without additional LLM calls, improving both efficiency and evidence quality. Across six complex retrieval benchmarks, MCompassRAG improves information efficiency (IE) by 8.24% on average with over 5 times lower latency than the strongest efficient RAG baselines. Code is available on https://github.com/AmirAbaskohi/MCompassRAG.

25.
PLOS Computational Biology 2026-06-12

Ten simple rules for executing an inherited research plan in computational biology

by Sahar Javaheri Tehrani, Toni Ingolf Gossmann Trainees in computational biology frequently inherit research plans whose aims, datasets, analytical strategies, and technical constraints were defined before their arrival. These plans often emerge from grants, collaborations, legacy codebases, shared high-performance computing environments, or partially completed analyses. While such plans provide a useful scaffold, they rarely specify all implementation details, prior assumptions, evaluation criteria, or dependencies needed for reliable execution. The transition from inheriting a partially articulated plan to producing reproducible results therefore creates an execution gap: a phase in which trainees must reconstruct what the project is, which elements are fixed, which remain negotiable, and which technical or organizational assumptions need to be tested before full-scale analysis begins. In this Ten Simple Rules article, we provide a practice-oriented framework for stabilizing inherited computational biology projects before workflows, benchmarks, and decision paths become entrenched. We do not claim that the individual practices described here are novel in isolation. Rather, our contribution is to organize familiar practices into a sequenced framework for a recurrent but under-articulated phase of computational research: inherited-plan execution. Computational biology makes this phase especially important because projects often combine heterogeneous datasets, fragile software environments, undocumented preprocessing choices, benchmarking assumptions, distributed collaborators, and asymmetrical access to contextual knowledge. By making this transition visible and operational, the rules aim to help trainees, supervisors, and collaborators reduce ambiguity, test feasibility, document decisions, and support reproducible and equitable project execution under real-world constraints.