Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-17

Tunneling Dynamics and Time Delay in Electron Transport through Time-Dependent Barriers with Finite-Bandwidth Reservoirs

arXiv:2507.20649v2 Announce Type: replace-cross Abstract: We study a model system consisting of a tunneling barrier driven by an external harmonic field and coupled to two leads with finite bandwidth. Avoiding Floquet expansions, we derive simple expressions for the time-dependent tunneling current in the adiabatic regime. Our approach relates the barrier modulation to a measurable time delay in the steady-state periodic current. It provides a physically consistent definition of the tunneling time inside the barrier by subtracting the time delay associated with the leads from the total time delay. We find that the tunneling time always vanishes for wide/high barriers. Remarkably, the time delay persists even when the barrier becomes static, i.e., in the limit where the modulation frequency vanishes. This indicates that the time delay obtained through the introduction of an external periodic perturbation actually reflects an intrinsic property of the tunneling dynamics, rather than an effect of the external drive or of a particular system. We apply our results to the analysis of tunneling times in optical experiments and find good agreement with the experimental data.

02.
arXiv (CS.AI) 2026-06-17

Handling Feature Heterogeneity with Learnable Graph Patches

arXiv:2606.17667v1 Announce Type: cross Abstract: In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a significant challenge is that existing models are unable to address feature heterogeneity in graph data without textual information, which hinders the transferability of graph models across different datasets. To bridge this gap, we propose the concept of learnable graph patches, which we regard as the smallest semantic units of any graph data. We decompose the graph into learnable graph patches by unfolding the node features and constructing corresponding patch structures separately. We then design a framework that mines transferable information from graph data across domains. Specifically, after extracting graph patches, we propose a patch encoder to extract knowledge from each unit and a patch aggregator to learn how the units are combined into a whole. Due to its domain-agnostic nature, the model can be applied to downstream data across different domains. Furthermore, we analyze the connection between our method and existing graph models, as well as the transferability of the node embeddings it generates. Empirically, our method not only achieves the capability to use multi-domain graphs for pre-training, but also shows enhanced performance across various downstream datasets and tasks. Moreover, we observe consistent improvement in downstream performance as the volume of pre-training data increases.

03.
medRxiv (Medicine) 2026-06-18

Digital self-efficacy as a potential intermediary between vision impairment and daily internet use among older adults: A cross-sectional analysis of HINTS 2024

Background: Older adults with vision impairment often experience barriers to using digital technology. The indirect associations between vision impairment and digital access and skills via digital self-efficacy and frustration among older adults remain largely unknown. Objective: This study aimed to 1) explore factors associated with digital access, skills, self-efficacy, and frustration among older adults with vision impairment; 2) examine associations between vision impairment and digital access, skills, self-efficacy, and frustration among older adults; and 3) examine whether digital self-efficacy and frustration may help explain associations between vision impairment and digital access and skills among older adults. Methods: This was a cross-sectional study using nationally representative data from the Health Information National Trends Survey (HINTS) 2024. Respondents aged 60 and older were included. Vision impairment was assessed using a self-reported item. Outcomes included self-reported digital access, skills, self-efficacy, and frustration. Survey-weighted multivariable logistic regression and generalized structural equation modeling were conducted, adjusting for age, sex, race/ethnicity, education, and the number of comorbidities. Results: Among 3,149 older adults (mean [SD] age, 70.7 [10.0] years; 45.6% female), 7.1% (n=223) reported vision impairment. Among older adults with vision impairment, 65.6% (95% CI, 53.5% to 75.9%) used the internet daily, and 79.5% (95% CI, 66.8% to 88.2%) used a smartphone in the past 12 months. In multivariable logistic regression analyses among older adults with vision impairment, older age was associated with lower odds of daily internet use (OR, 0.84; 95% CI, 0.79 to 0.90), smartphone use (OR, 0.85; 95% CI, 0.75 to 0.97), wearable device use (OR, 0.88; 95% CI, 0.79 to 0.97), and using the internet to send a message to a healthcare provider (OR, 0.87; 95% CI, 0.80 to 0.93). Older adults who self-identified as racial and ethnic minority groups (e.g., Black/African American, Hispanic) had lower odds of daily internet use (OR, 0.15; 95% CI, 0.05 to 0.50) and using the internet to send a message to a healthcare provider (OR, 0.17; 95% CI, 0.04 to 0.73) compared with Non-Hispanic White older adults. Vision impairment was associated with lower odds of daily internet use (OR, 0.60; 95% CI, 0.37 to 0.99) and digital self-efficacy (OR, 0.53; 95% CI, 0.32 to 0.86). Digital self-efficacy was associated with higher odds of daily internet use (OR, 2.95; 95% CI, 2.04 to 4.26). Generalized structural equation modeling identified an indirect association between vision impairment and daily internet use via digital self-efficacy (coefficient, -0.68; 95% CI, -1.24 to -0.12). Conclusions: Findings suggest that reduced digital self-efficacy may help explain the observed association between vision impairment and daily internet use among older adults. Interventions targeting digital self-efficacy, including accessible interface designs, personalized coaching, and peer support, may help bridge the digital divide among older adults with vision impairment.

04.
arXiv (CS.LG) 2026-06-16

Factorized Neural Operators Decompose Dynamic and Persistent Responses

arXiv:2606.16900v1 Announce Type: new Abstract: Physical systems often exhibit heterogeneous mechanisms, where rapidly evolving dynamics coexist with persistent structures. Capturing such multiscale physical behavior remains challenging for existing neural operators, which typically rely on single dominant inductive bias and therefore couple distinct physical responses into a shared representation. We introduce the Unified Green's Function Framework across domains and propose the Factorized Neural Operators (FaNO), which decompose spectral representations into equivariant dynamic responses and invariant persistent responses, leading to better interpretability and generalization. Mechanistically, we show that the two operator branches spontaneously specialize into distinct physical roles that remain consistent across scales and domains: the equivariant branch captures rapidly varying transient dynamics, whereas the invariant branch extracts coherent persistent structures. This factorized mechanism of FaNO improves prediction accuracy, parameter efficiency and cross-scale generalization across physical systems and domains. In particular, it maintains consistent predictions under long-horizon autoregressive rollout, cross-resolution extrapolation and physical-regime shifts. These findings suggest that scalable physical modeling may benefit from moving beyond single-inductive-bias formulations toward factorized operator representations that better reflect the heterogeneous organization of physical systems, accelerating the reliable deployment of machine learning for scientific computing and discovery.

05.
arXiv (CS.LG) 2026-06-17

HeteRo-Select: Informativeness as the Participation Driver in Heterogeneous Federated Learning

arXiv:2508.06692v2 Announce Type: replace Abstract: Federated learning systems typically allocate gradient compression by link speed. This is sensible when bandwidth and data informativeness align. However, under non-IID data, these signals often decorrelate or invert. A bandwidth-driven allocator then risks compressing the most informative gradients hardest. We propose HeteRo-Select, a framework that replaces bandwidth with a per-client informativeness score as the primary driver of compression. The score jointly governs three decisions per round: client selection, compression ratio, and server aggregation weight, with bandwidth retained only as a hard ceiling. Score-proportional selection provably reduces the effective heterogeneity of the chosen subset; score-proportional compression provably lowers aggregate top-$k$ error at fixed traffic. Under the exact FedCG simulation protocol, HeteRo-Select delivers a $1.78\times$ speedup and an $18.2\%$ reduction in traffic on CIFAR-10. The same configuration, unchanged, scales from a $7{,}850$-parameter logistic regression to an $11.27$M-parameter ResNet-18, hitting the accuracy target on three of four benchmarks. When bandwidth and informativeness are deliberately anti-correlated, the method still achieves the target accuracy with less traffic than the normal-bandwidth run.

06.
arXiv (CS.AI) 2026-06-16

Resilient Consensus in Agentic AI

arXiv:2606.15024v1 Announce Type: cross Abstract: Large language model (LLM) agents are increasingly deployed in multi-agent systems where they must coordinate and agree on shared decisions. We ask whether classical resilient consensus theory, developed for deterministic agents, transfers to LLM agents that may behave adversarially. Framing LLM agreement as a Byzantine consensus game, we run controlled experiments on complete and general communication graphs. We find that prompted LLM agents fail to reach agreement that is achievable in principle: consensus can fail even in settings where classical theory guarantees that a convergent algorithm exists, and this failure persists across temperatures and horizons. At the same time, wrapping the agents with classical resilient consensus filters improves agreement. The benefit of filtering depends on how much robustness the underlying topology already provides. Our results suggest that classical resilient consensus theory is a useful lens for the safety of agentic AI.

07.
arXiv (CS.AI) 2026-06-16

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

arXiv:2606.01561v2 Announce Type: replace Abstract: Aligning Large Language Models (LLMs) with human preferences is often formulated via Direct Preference Optimization (DPO). However, the standard Bradley-Terry instantiation of DPO is limited in modeling common departures from transitivity in human preferences. To address this, recent work has introduced Self-Play Preference Optimization (SPPO), which iteratively refines the policy by training on self-generated win-lose pairs. Our investigation, however, reveals a critical instability in SPPO: the optimization is prone to policy degeneration when the preference oracle assigns overly confident wins to semantically indistinguishable responses. To mitigate this, we propose S-SPPO, a dual-space semantic calibration framework comprising: i) Supervision Calibration via semantic gating, which anneals win rate targets toward the maximum-entropy baseline as semantic overlap increases; and ii) Representation Calibration via latent repulsion to enforce geometric diversity to prevent manifold collapse and maintain latent diversity between chosen and rejected samples. Theoretically, we show that the calibration preserves the constant-sum game structure, facilitating convergence to a Nash Equilibrium. Empirically, S-SPPO avoids the performance degradation seen in prior methods, achieving 52.19% win rate and 47.46% length-controlled win rate on AlpacaEval 2.0 with Llama-3-8B, without using additional human-annotated preferences during training. The code will be available at https://github.com/xiwenc1/s-sppo.

08.
arXiv (math.PR) 2026-06-11

Percolation on hierarchical lattices

arXiv:2606.11503v1 Announce Type: new Abstract: We consider independent Bernoulli percolation on top of sequences of hierarchical graphs. Given a graph $G_{1}$ with two distinguished vertices $a_{1}$ and $b_{1}$, the hierarchical graph with seed $G_{1}$ is the sequence $\big( G_{k} \big)_{k \geq 1}$ resulting from the inductive procedure, where the graph $G_{k+1}$ is obtained from $G_{k}$ by replacing each of its edges with a copy of $G_{1}$, attached by the vertices $a_{1}$ and $b_{1}$. We prove that, under sharp hypotheses, percolation on these graphs presents a unique phase transition. Second, we establish the existence of several critical exponents in this context, such as the critical exponents for the correlation length $\nu$, the surface tension $\mu$, the one-arm exponent $\alpha_{1}$. Several results are also obtained for their infinite counterpart $G_\infty$, which is the Benjamini-Schramm limit of $G_k$: uniqueness of the infinite cluster, continuity of $\theta(p)$, existence of the percolation-probability exponent $\beta$ and scaling relations for the critical exponents $\alpha_1$, $\nu$ and $\beta$. Furthermore, we analyze noise sensitivity for crossing functions in $G_{k}$ and establish sharp noise sensitivity in this setting. Finally, we propose a setup where it is possible to verify the locality hypothesis, stating that the critical threshold for percolation is a local property, while critical exponents are determined by the global geometry of the graph. As a consequence of the techniques developed here, we also provide a necessary and sufficient condition for the existence of a unique fixed point for the map $p \mapsto \mathbb{E}_p[g]$ in $(0,1)$, where $g:\{0,1\}^n \to \{0,1\}$ is a nontrivial monotone Boolean function.

09.
arXiv (quant-ph) 2026-06-12

Testing the problem of time with cold atoms

arXiv:2509.07745v3 Announce Type: replace-cross Abstract: We realize a cold-atom system to quantitatively test relational constructions of time. A well-isolated atomic Bose-Einstein condensate evolves in a conservative trap that is partitioned by a thin optical barrier into an observed and unobserved sector, with negligible dissipation on the experimental timescale. Motivated by relational-time approaches discussed in the Wheeler-DeWitt framework, we ask whether the dynamics of the observed sector can be ordered using only internal degrees of freedom. To this end, we construct an entropic time from an experimentally defined coarse-grained entropy, and demonstrate that it can robustly order the events in the observed sector across repeated cycles of expansion and recollapse. We finally derive an effective Schroedinger equation parameterized by this internal time and show that it is able to reproduce the measured evolution. These results establish a controlled experimental setting in which relational-time constructions can be quantitatively tested.

10.
arXiv (quant-ph) 2026-06-16

Hardy and Cabello Arguments in Spatial and Temporal Frauchiger-Renner Scenarios

arXiv:2606.15467v1 Announce Type: new Abstract: We investigate Hardy- and Cabello-type logical structures within spatial and temporal extensions of the Frauchiger–Renner (FR) framework, embedding these constructions directly into the FR multi-observer architecture. In the spatial multi-observer scenario, both Hardy and Cabello contradictions arise, with the Cabello construction yielding the stronger violation,$\(\Delta_Cabello^{\max}=0.1078\)$, which exceeds the maximal Hardy probability $\(P_{H}^{\max}=\frac{5\sqrt{5}-11}{2}\approx 0.09017\)$. We then develop a sequential temporal FR protocol based on coherent multi-observer measurements performed on a single spin-$\tfrac12$ system. In this temporal setting, the Hardy contradiction disappears identically due to dynamical constraints imposed by sequential state updates, whereas a finite Cabello-type violation survives, \(\Delta_Cabello^{\max}\approx 0.0674\). Our results establish a fundamental structural distinction between spatial entanglement and temporal multi-observer correlations in FR-type logical scenarios, and demonstrate that certain observer-independent description failures persist even without spacelike separation.

11.
arXiv (CS.LG) 2026-06-18

Some Complexity Results for Robustness Verification for Binarized Neural Networks

arXiv:2606.18918v1 Announce Type: new Abstract: This paper studies the computational complexity of verification problems for Binarized Neural Networks (BNNs), where activations (and sometimes weights) are binary. We analyze two problems: satisfiability and robustness under uniform image occlusion. We show that BNN satisfiability is NP-complete via a reduction from Boolean satisfiability problem (SAT), and that uniform occlusion induces a piecewise-constant structure in the network output, enabling a polynomial-time robustness-checking algorithm.

12.
arXiv (CS.AI) 2026-06-12

Topical Phase Transitions in Artificial Intelligence Research: Large-Scale Evidence and an Early-Warning Signature for Emerging Topics

arXiv:2606.12828v1 Announce Type: new Abstract: Do research topics in artificial intelligence grow gradually, or do they advance through abrupt, detectable jumps? Analyzing 80,814 accepted main-track papers from five premier AI conferences (ACL, CVPR, ICLR, ICML, NeurIPS) spanning 2017 to 2025, we show major AI topics advance through topical phase transitions: remaining marginal for years, then surging across venues within one to three years. Large language models became the dominant cross-venue topic by 2025, diffusion models rose with comparable abruptness, and language-model methods crossed into computer vision via vision-language models, whereas reinforcement learning compounded smoothly, distinguishing genuine phase transitions from ordinary growth. This structure is our primary contribution: a large-scale, cross-venue characterization of how AI research reorganizes. We then ask whether a transition leaves a detectable footprint before it peaks. We define an early-warning signature, four publication-dynamics criteria frozen on 2017-2021 data, and evaluate it out of sample on 2023-2025 transitions, obtaining a precision of 27% and recall of 63% against a 13.5% base rate. Applied to 2025 data, the signature flags reasoning and test-time compute, agentic AI, multimodal LLMs, retrieval-augmented generation, and world models as topics to monitor over 2026-2028. The source code is also publicly available on GitHub at https://github.com/KurbanIntelligenceLab/ai-phase-transitions.

13.
arXiv (CS.CV) 2026-06-11

ActionMap: Robot Policy Learning via Voxel Action Heatmap

Vision-language-action (VLA) models have advanced rapidly across backbones, training recipes, and data scale, yet the action decoder, which converts the backbone's hidden state into a continuous control signal, has barely changed and remains a single-point predictor across the majority of current VLAs. Whether implemented via autoregressive token bins, L1 regression, or flow-matching denoising, the resulting decoder treats the action space as unstructured, leaving the geometric proximity of neighboring actions unexploited during training. To advance this, we introduce ActionMap, a voxel heatmap action head that drops into an existing VLA in place of its native action decoder. For each new action, the head predicts a voxel heatmap over the action space, where each voxel directly stores the probability of the corresponding action. Across LIBERO simulation and real-world Franka manipulation, our heatmap head surpasses two architecturally distinct backbones at matched training steps (e.g., +8.2% over OpenVLA-OFT's L1 regression head on the LIBERO four-suite average), converges at comparable or faster rates on both backbones, and remains markedly more data-efficient at low training data. The cross-backbone consistency indicates that action representation is a real lever for VLA performance, distinct from further backbone or recipe scaling. Project Page: https://showlab.github.io/ActionMap/.

14.
arXiv (quant-ph) 2026-06-17

A Lindbladian for holographic Brownian motion

Authors:

arXiv:2606.17909v1 Announce Type: cross Abstract: We derive a Lindbladian description of holographic Brownian motion in the high-temperature regime. Starting from the influence functional for a trailing string endpoint, we identify the corresponding quantum master equation and prove that it is completely positive and trace-preserving. We determine the coefficients of the Lindbladian explicitly for two holographic backgrounds: the BTZ black hole and the AdS$_5$ black brane, restricting in the latter case to the endpoint fluctuation along the $x^1$-direction. We then analyze the time evolution of phase-space moments, energy relaxation, and steady states.

15.
arXiv (math.PR) 2026-06-11

Numerical simulations of the spread from the mean of the SLE and Multiple SLE dynamics

arXiv:2606.11254v1 Announce Type: cross Abstract: The Schramm-Loewner Evolution (SLE) describes a family of fractal curves that arise in the study of the scaling limits of many planar Statistical Physics models. These curves are modeled using the Loewner Differential Equation for the conformal maps $g_t(z)$ with a Brownian motion driver. Using Euler's Method, in the current work we performed numerical experiments to study at a fixed time the quantities $|g_t(z) - \overline{g_t(z)}|$ and $Re(g_t(z)) - Re(\overline{g_t(z)})$, where $Re$ denotes the real part and $\overline{g_t(z)}$ refers to the sample average. These random variables measure the 'spread' of the dynamics from the average behavior at fixed time. One of the scopes of this work is to give numerical predictions for future theoretical investigations on these quantities. When investigating these quantities in the SLE case our experiments predict that the distribution is bimodal when the dynamics started close to the origin, and it can become bell-shaped if the dynamics is started further from the origin. In the second part, we performed experiments for a Multiple SLE model whose driver is Dyson Brownian Motion. Due to singularity in the dynamics of the drivers and the many data points needed, this part is challenging from a computational perspective. In the multiple SLE case, our experiments predict that the distribution is bell-shaped in all cases. In addition, we check the changes in the distributions as we vary the parameter $\kappa$ in the SLE case and $\beta$ in the Multiple SLE case.

16.
arXiv (CS.LG) 2026-06-12

Interpretable Factor Decomposition for Decision Intelligence in Large-Scale Financial Markets: Evidence from China's A-Share Market

arXiv:2606.12843v1 Announce Type: new Abstract: We present an interpretable machine learning pipeline to decompose Cross-Sectional Equity Return Predictability into auditable factor contribution. We apply an XGBoost model with TreeSHAP attribution and conduct stress testing on 3632 Chinese A-share stocks from 2009 until 2019. Using 60-month, rolling windows over 55 months of out-of-sample data, XGBoost obtains a mean AUC of 0.547 and +2.38%/month (Newey-West t = 5.94; Annualized Sharpe 2.23) long-short spread for the top vs bottom quintiles. This alpha is persistent after adjusting for the Carhart four-factor model (+2.31%/month; t = 7.48). SHAP Decomposition indicates that behavioral signals (turnover and momentum) account for 58.2% of predictive attribution compared to 10.7% for valuation ratios, on average, across 55 industry groups. Ablation analysis serves to cross-validate this ranking and provides evidence that SHAP and ablation diverge in a manner that highlights feature substitutability structure that is largely invisible to either method used in isolation.

17.
arXiv (CS.CL) 2026-06-15

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

AI evaluations are widely used for testing and understanding progress. However, the diverse evaluators bring with them inconsistencies that challenge analysis and comparison. First, results are saved in incompatible formats, scattered across leaderboards, papers, blog posts, evaluation harness logs, and custom repositories. Second, results are created by different evaluation frameworks, which produce divergent scores for nominally identical evaluations and record metadata inconsistently, hindering comparison, cross-community evaluation science, cost reduction, and reuse. We introduce Every Eval Ever, the first shared schema and community-crowdsourced repository for AI evaluation results. The schema standardizes how evaluations are represented in a unified, single JSON document. It is source-agnostic by design, ingesting results from evaluation harnesses and papers alike, and optionally stores per-instance outputs for fine-grained analysis. We contribute: (i) a community-governed metadata schema with a companion instance-level schema, the first standardization effort of its kind; (ii) automatic converters from popular formats, evaluation harnesses, and leaderboards to the unified schema; and (iii) a crowdsourced community database hosted on Hugging Face, currently spanning to date 22,235 models, 2,273 unique benchmarks, and 31 evaluation formats.

18.
arXiv (CS.LG) 2026-06-16

Bayesian Optimization for Learning Nonlinear MPC in Autonomous Agent Navigation

arXiv:2606.14763v1 Announce Type: cross Abstract: Real-time autonomous navigation in dynamic, unknown environments remains a fundamental challenge for mobile robotics. We propose a map-free framework that tightly integrates reactive rolling-horizon planning with nonlinear Model Predictive Control (MPC). At each control cycle, a LiDAR-based Gaussian occupancy representation is constructed and used to generate collision-free trajectories via A* search, which are then tracked by a CasADi/IPOPT MPC formulation incorporating a smooth sigmoid obstacle barrier. To improve robustness to parameter sensitivity, we adopt an offline Bayesian optimization scheme based on Tree-structured Parzen Estimators (TPE), which identifies near-optimal controller parameters with respect to a composite navigation objective. In addition, a Gaussian Process surrogate is used to analyze parameter sensitivity and provide insight into the optimization landscape. The proposed framework is robot-agnostic and is evaluated on the Unitree Go2 quadruped in simulation using Gazebo, followed by deployment on the physical robot. Experimental results show that parameters tuned in simulation transfer effectively to hardware, maintaining comparable performance without additional tuning. The full system achieves up to a 90.0\% navigation success rate when deployed, along with a 38.9\% average improvement in the evaluation metrics across simulated environments.

19.
medRxiv (Medicine) 2026-06-11

Two modes of aversive control in suicidality: joint computational modelling exposes regime-specific clinical signatures invisible to symptom-based stratification

Suicidal thoughts and behaviours (STBs) are heterogeneous in their proximal dynamics, planning, and stress-sensitivity, yet most subtyping efforts remain symptom-driven and rarely validated across independent datasets. Computational mixture modelling offers a principled alternative: by fitting explicit models of learning and action selection and partitioning individuals by their latent parameter profiles, it can identify mechanistically distinct control strategies invisible to cross-sectional symptom measurement. We applied this approach to aversive Go/NoGo performance, jointly clustering two independently collected STB-enriched samples (N = 50 and N = 184) using tasks with the same structure but different duration, reversal timing, and clinical instrumentation. Two recurrent behavioural regimes emerged: a fast/adaptive regime characterised by rapid policy updating and elevated feedback reactivity, and a slow/perseverative regime characterised by slow updating, high choice determinism, and a pronounced cost following contingency reversal. These regimes were stable across initialisations, recovered more parsimoniously in joint than independent solutions, and were largely orthogonal to symptom-based stratification. Critically, stratification by regime exposed clinical-computational coupling structures substantially attenuated in pooled analyses. Pooled, population-level associations were modest and anchored by a broad affective burden axis. Within the slow/perseverative regime, coupling reorganised around learning dynamics and internalizing burden (depression, hopelessness, and active suicidal ideation) with markedly larger effect sizes. Within the fast/adaptive regime, a dissociation between anxious-compulsive and antisocial-disinhibitory profiles emerged along the same computational axis, invisible at the population level. These findings support a view of suicidality heterogeneity in which clinically similar individuals differ in the control strategies they recruit under aversive uncertainty - variation that symptom measurement alone cannot capture.

20.
arXiv (CS.AI) 2026-06-11

Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning

arXiv:2606.11675v1 Announce Type: new Abstract: Diagnosing pulmonary diseases requires integrating heterogeneous evidence amid phenotypic variability and cross-disease overlap. Although large language models (LLMs) have shown progress on pulmonary knowledge question answering (QA) and information-processing tasks, reliable pulmonary diagnosis requires patient-specific, relation-aware reasoning over electronic medical record (EMR) evidence rather than isolated knowledge recall. We define this gap between pulmonary knowledge and case-level diagnostic reasoning as the Pulmonary Knowledge-to-Diagnosis Gap. To address it, we introduce LungKG, the first structured pulmonary knowledge graph for diagnostic knowledge organization and record-grounded reasoning. LungKG contains 59,038 nodes and 164,308 edges across 15 entity types and 112 relation types, serving as both a reusable pulmonary knowledge resource and the foundation for LungKG-guided model adaptation. Built on LungKG, we propose Lung-R1, a LungKG-guided pulmonary LLM trained through KG-constrained reasoning-chain construction and KG-guided reinforcement learning. In a 20-system evaluation, Lung-R1-14B achieves state-of-the-art performance across Choice, Pulmonary-QA, and EMR Diagnosis, reaching an EMR Diagnosis score of 4.3583 and surpassing the strongest non-Lung-R1 baseline by 0.1476 points. These results demonstrate the value of LungKG-guided training for EMR-based pulmonary diagnosis.

21.
arXiv (CS.LG) 2026-06-15

PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data

arXiv:2507.20068v2 Announce Type: replace Abstract: Off-policy evaluation (OPE) methods estimate the value of a new reinforcement learning (RL) policy prior to deployment. Recent advances have shown that leveraging auxiliary datasets, such as those synthesized by generative models, can improve the accuracy of OPE methods. Unfortunately, such auxiliary datasets may also be biased, and existing methods for using data augmentation within OPE lack principled uncertainty quantification. In high stakes domains like healthcare, reliable uncertainty estimates are important for ensuring safe and informed deployment of RL policies. In this work, we propose two methods to construct valid confidence intervals for OPE with data augmentation. The first provides a confidence interval over $V^{\pi}(s)$, the policy value conditioned on an initial state $s$. To do so we introduce a new conformal prediction method suitable for Markov Decision Processes (MDPs) with continuous state spaces, extending prior work to higher-dimensional settings. Second, we consider the more common task of estimating the average policy performance over all initial states, $V^{\pi}$; we introduce a method that draws on ideas from doubly robust estimation and prediction powered inference. Across simulators spanning inventory management, robotics, healthcare, and a real healthcare dataset from MIMIC-IV, we find that our methods can effectively leverage auxiliary data and consistently produce confidence intervals that cover the ground truth policy values, unlike previously proposed methods. Our work enables a future in which OPE can provide rigorous uncertainty estimates for high-stakes domains.

22.
arXiv (CS.AI) 2026-06-16

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

arXiv:2601.19697v2 Announce Type: replace-cross Abstract: Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and the target code. Additionally, we employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval. We evaluate AlignCoder on two widely-used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1% improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show that our framework achieves superior performance and exhibits high generalizability across various code LLMs and programming languages.

23.
arXiv (CS.CV) 2026-06-15

Relational Retrieval: Leveraging Known-Novel Interactions for Generalized Category Discovery

In this study, we tackle Generalized Category Discovery (GCD) via a Relational Retrieval perspective, explicitly coupling labeled and unlabeled data through bidirectional knowledge transfer. While existing methods treat these sources separately, missing valuable interaction opportunities, we propose Relational Pattern Consistency (RPC) that enables mutual enhancement. RPC employs One-vs-All classifiers for soft ID/OOD decomposition, then introduces two mechanisms: (i) for known-class preservation, we transfer semantic behavioral alignment; (ii) for category discovery, we leverage the insight that samples from the same category maintain invariant relationships with known-class prototypes, transforming unreliable pseudo-labeling into well-defined relational pattern matching. This bidirectional design allows labeled data to guide unlabeled learning while discovering novel categories through their collective relational signatures. Extensive experiments demonstrate RPC achieves state-of-the-art performance on both generic and fine-grained benchmarks.

24.
arXiv (CS.CL) 2026-06-16

Weaving Multi-Source Evidence for Biomedical Reasoning: The BioMedHop Benchmark and BioWeave Framework

Biomedical question answering (QA) increasingly requires reasoning over interacting entities, where supporting evidence is scattered across biomedical knowledge graphs, literature documents, and web-accessible resources. However, existing biomedical QA benchmarks mainly focus on exam-style knowledge, literature comprehension, or short-range multi-hop inference, leaving source-conditioned graph reasoning and evidence topology construction underexplored. To fill this gap, we introduce BioMedHop, a multi-source graph-grounded benchmark for evaluating biomedical reasoning over structured evidence topologies. BioMedHop contains 10,045 instances across KG, document, web, and hybrid evidence settings, covering shared-neighbor matching, intersection reasoning, path-based reasoning, and counting, with option-based, open-ended, and numeric count renderings. To support this benchmark, we further propose BioWeave, a source-aware reasoning framework that retrieves biomedical KG paths, gathers supporting clues from documents and web sources, assembles them into a unified evidence graph, and verifies answers through entity-level evidence support. Comprehensive experiments show that BioWeave achieves the best overall performance among compared methods on BioMedHop, outperforming the strong hybrid baseline ToG-2 by 10.5% in the overall average. Moreover, BioWeave consistently improves different LLM backbones and enables smaller models, such as Qwen3-4B, to achieve reasoning performance comparable to GPT-4-Turbo.

25.
medRxiv (Medicine) 2026-06-11

Hantavirus Disease in Uruguay: Trends and Mortality Before and During the COVID-19 Pandemic.

Introduction: Hantavirus disease is an emerging and potentially severe zoonosis of global distribution. In Uruguay, it is transmitted by rodents inhabiting peridomestic, suburban, and rural areas. Global incidence is estimated at 150,000 to 200,000 cases per year, with up to 300 annual cases in the Americas. Since 1997, Uruguay's Ministry of Public Health (MPH) has monitored Hantavirus cardiopulmonary syndrome (HCPS), the most common clinical presentation in the region. By 2019, a total of 271 cases had been identified in the country, with an estimated mortality rate of nearly 50%. Objectives: To describe the clinical, epidemiological, and occupational characteristics of patients with Hantavirus disease in Uruguay during the pre-pandemic (2018-2019) and pandemic (2020-2021) periods. Methods: A descriptive, cross-sectional, observational study was conducted, including all serologically confirmed cases of Hantavirus infection reported to the MPH between 2018 and 2021. Clinical and demographic data were extracted from the mandatory reporting form for zoonotic diseases. Incidence and case fatality rates were calculated, and factors associated with fatal outcomes were analyzed. Results: A total of 58 confirmed cases were identified between 2018 and 2021. Most patients were male (62%), with a mean age of 36.5 years (SD 16). A decline in incidence was observed during 2020-2021, with no significant change in case fatality. Direct rodent exposure was the most frequently associated risk factor. Montevideo and Canelones were the most affected departments. Renal and pulmonary involvement were significantly associated with mortality. Conclusion: Hantavirus remains a relevant public health concern in Uruguay. Although a decrease in incidence was observed during the COVID-19 pandemic years, case fatality rates remained high. The findings underscore the need for sustained surveillance and early recognition, particularly in urbanizing regions.