Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-25

FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning

arXiv:2605.20256v2 Announce Type: replace-cross Abstract: Reinforcement learning has become a cornerstone for aligning and unlocking the reasoning capabilities of large-scale models. At its core, the training loop of GRPO and its variants alternates between rollout sampling and policy update: the policy first samples rollouts from its action space, and then updates its parameters according to the advantages computed over them. Unlike supervised learning, where each gradient step is anchored to an explicit ground-truth target, the optimal gradient direction for updating model parameters in this setting is not known a priori; the high-quality rollouts drawn during the sampling stage therefore act as the implicit "teacher" that guides every parameter update. However, mainstream RL algorithms such as GRPO adopt a simple sampling scheme that conditions all rollouts on the same original prompt. When a task lies beyond the policy model's current capability, this sampling scheme rarely yields a high-quality rollout, leaving the policy model without a meaningful gradient direction when updating its parameters, which causes training to stall. To address this issue, we propose FBOS-RL. Specifically, we let the model perform Feedback-Guided Exploration Enhancement based on the feedback provided by the environment, and on top of this we design two mutually reinforcing training objectives: EPA and ECC. Extensive experiments demonstrate that EPA and ECC can mutually reinforce each other, forming a positive flywheel effect that significantly improves both the training efficiency and the final performance ceiling of reinforcement learning. Specifically, under both an identical number of rollouts and the same number of training steps, FBOS-RL learns substantially faster than GRPO and feedback-based baselines and ultimately attains a higher performance ceiling, while exhibiting higher policy entropy and lower gradient norms throughout training.

02.
arXiv (CS.CL) 2026-06-16

Creative Collision: Directorial Persona Steering and Competition in Large Language Models

Activation steering has emerged as a powerful tool for shaping the behaviour of large language models at inference time, yet most prior work injects a single semantic direction into the residual stream. We study the richer setting in which two semantically opposing steering vectors are superimposed – a regime we call Creative Collision. Concretely, we construct directorial persona vectors for Steven Spielberg (optimistic, redemptive moral valence) and Martin Scorsese (dark, morally ambiguous) via mean-difference activation contrast on curated screenplay-derived corpora, then interpolate between them with a scalar mixing parameter $\alpha \in [0,1]$ and a steering coefficient $\lambda$. Across five evaluation axes – moral valence, generation coherence, surface style, directional dominance, and vector geometry – three principal findings emerge: (i)~Spielberg's representational signature exhibits robust directional dominance, suppressing Scorsese's moral influence across almost the entire interpolation range; (ii)~intermediate collision points paradoxically improve generation coherence relative to pure single-director steering at high $\lambda$; and (iii)~both personas localise maximally to layer~28 of a 40-layer decoder-only transformer, revealing a shared moral-tone substrate. These results illuminate the geometry of competing semantic directions in transformer residual streams and have direct implications for controllable creative generation and value-aligned narrative synthesis.

03.
arXiv (quant-ph) 2026-06-15

An integrated ultrahigh vacuum cluster tool for diamond surface science and single nitrogen-vacancy center measurements

arXiv:2606.13961v1 Announce Type: new Abstract: We present a custom-designed ultrahigh vacuum (UHV) cluster tool developed for studying shallow nitrogen-vacancy (NV) centers in diamond, enabling in situ diamond surface preparation, characterization, and single NV center dynamics measurements within a single connected platform. The system combines a surface science chamber for controlled surface modification and analysis with a cryogenic confocal microscope chamber dedicated to NV spin and optical measurements. This integrated approach enables a direct correlation between diamond surface chemistry and the resulting NV spin and charge properties. The instrument provides a versatile platform for systematic studies of surface-induced decoherence mechanisms and charge dynamics for shallow NV centers, and establishes a pathway toward reproducible surface engineering for quantum sensing applications.

04.
arXiv (CS.LG) 2026-06-19

Neural network surrogates with uncertainty quantification for inverse problems in partial differential equations

arXiv:2606.20417v1 Announce Type: new Abstract: Inverse problems for differential equations arise throughout science and engineering, where one seeks to infer unknown model parameters from noisy or incomplete observations. Traditional numerical methods for these problems are often computationally expensive, particularly in Bayesian settings where evaluating the likelihood becomes costly for complex forward models and high-dimensional parameter spaces. To address this challenge, we introduce DeepGaLA, a neural-network surrogate for differential equation solvers that provides uncertainty-aware predictions, reducing overconfident inference when training data are limited. To evaluate the fidelity of the surrogate-induced posterior approximations in practice, we show that a short run of delayed-acceptance Markov chain Monte Carlo can serve as an effective diagnostic. Across a range of numerical experiments, DeepGaLA delivers forward-model approximations with accuracy comparable to established Gaussian-process surrogates, while better maintaining efficiency as parameter dimension grows. Moreover, it can incorporate differential-equation constraints, including in nonlinear settings. Overall, these results indicate that uncertainty-quantified neural surrogates can enable scalable and reliable Bayesian inference for inverse problems in complex systems.

05.
arXiv (CS.LG) 2026-06-18

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

arXiv:2602.21160v3 Announce Type: replace-cross Abstract: In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=\sigma_k^{2}/(2\mu_k)$, with $\mu_k{=}\mathbb{E}[p_k]$ and $\sigma_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/\mu_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

06.
medRxiv (Medicine) 2026-06-22

Generative Artificial Intelligence in Psychotherapy Practice: A Global Online Survey of Mental Health Professionals' Adoption

Background: Generative artificial intelligence (GenAI) tools, including large language model (LLM)-based platforms such as ChatGPT, Google Gemini, and Microsoft Copilot, are being adopted across healthcare settings with increasing speed. Despite the increasing popularity of GenAI, empirical data on the extent and nature of adoption by mental health clinicians in routine psychotherapy practice globally remain scarce. Objective: This study aimed to characterize current use patterns of GenAI tools among a global sample of practicing mental health professionals, including prevalence of use, specific tools employed, clinical and administrative purposes served, perceived effect on workload, and the institutional context shaping adoption (e.g., encouragement, prohibition, and training). Methods: We administered a cross-sectional online survey to a global convenience sample of licensed mental health professionals who provide psychotherapy as part of the scope of their practice (i.e., psychotherapists, psychologists, counsellors, nurses, and psychiatrists). Participants were recruited via professional networks, purposely avoiding the use of social media platforms. Within the survey, we captured GenAI use behaviors in psychotherapy contexts, and demographic and professional background data. Descriptive statistics were analyzed for all variables. Multivariate logistic regression was used to examine demographic and professional predictors of GenAI use. Results: A total of 766 mental health professionals who provide psychotherapy from 30 countries completed the survey. Of these, 54.6% (n=418) reported having purposely used at least one GenAI tool in psychotherapy clinical practice. ChatGPT was the most frequently used tool (354/418, 84.7%). The most commonly reported clinical purpose was assisting with treatment planning (175/418, 41.9%), followed by managing administrative tasks (173/418, 41.4%) and generating psychoeducational materials for clients (166/418, 39.7%). 82.8% of AI users reported that these tools reduced their overall work burden. Only 18.1% (139/766) of respondents reported institutional encouragement to use AI tools, while 81.1% (621/766) reported not having received any professional training on AI use. Predictors of AI adoption included younger age and rural practice setting. Conclusions: In this global convenience sample survey, GenAI use among mental health professionals in psychotherapy settings is widespread, concentrated in a wide variety of clinical and administrative tasks. Formal training and institutional guidance substantially lag behind current adoption patterns. These findings highlight an urgent need for evidence-based competency frameworks, regulatory clarity, and professional education to support safe and ethically informed integration of AI into clinical mental health practice.

07.
arXiv (quant-ph) 2026-06-19

Effective discrete-modulated continuous variable QKD under general attacks

arXiv:2606.20346v1 Announce Type: new Abstract: Continuous variable quantum key distribution via discrete modulations ensures information-theoretic security using standard telecom technologies, providing affordable and scalable quantum communications with simplified classical postprocessing. However, existing security proofs against general attacks often rely on restrictive assumptions, such as a bounded dimension for coherent states, or require impractically large block sizes. In this work, we develop a finite-size security analysis that removes these limitations while incorporating realistic experimental features. Our approach combines the dimension reduction technique, a security proof based on the marginal-constrained entropy accumulation, and a trusted detector model accounting for the receiver imperfections. We report positive key rates in the finite-size regime for relevant block sizes of the order of $10^8$. These results contribute to narrowing the gap between theoretical security proofs and practical implementations of discrete-modulated continuous variable quantum key distribution protocols.

08.
arXiv (CS.AI) 2026-06-16

Imperfect Visual Verification for Code Edition : A Case Study on TikZ

arXiv:2606.15693v1 Announce Type: cross Abstract: LLMs have significantly advanced code generation, enabling the synthesis of functional programs. While recent systems achieve strong performance on many coding benchmarks, tasks involving programs such as TikZ that generate visual artifacts remain challenging, in particular on visual code customization. Unlike generation from scratch, customization requires localized, semantics-preserving edits: the model must locate relevant code, modify it according to the instruction, and preserve the remaining structure and rendering. Approaches based on post-hoc iterative refinement/correction where a verifier provides feedback to guide corrections, have shown promise. However, in the case of programs with a visual outcome such as in TikZ, where correctness is harder or likely impossible to formalize and evaluate automatically, deterministic verifiers do not exist. Hence, developers can only rely on imperfect verifiers. In this paper, we conduct an empirical study to answer:to what extent can iterative refinement remain effective when the verifier itself is unreliable?} We use TikZ as a focused case study that isolates the core difficulties of the problem (weak code structure, fine-grained visual semantics, and difficult feature localization) in a controlled and challenging setting. We define visual code customization as an iterative editing problem with an imperfect oracle, and introduce a framework for analyzing such iterative refinements. We conduct a large-scale study and evaluate multiple LLM-based and tool-augmented visual verifiers within iterative refinement pipelines, and perform extensive manual annotation of refinement trajectories to assess verifier behavior and feedback quality. Our findings show that even imperfect verifiers can determine with moderate accuracy whether visual instructions are applied to code, achieving F1-scores up to 0.815. Feedback improves iterative refinement, especially for weaker models, adding 11–20 perfect customizations for Qwen3-vl-30b-a3b-Instruct, while stronger models like Gemini-3 gain fewer improvements (+5) but benefit more from accurate verification that prevents premature acceptance. Feedback is effective only when it precisely identifies image issues, provides actionable guidance, addresses all relevant problems, and remains grounded in the original instruction.

09.
arXiv (CS.CL) 2026-06-24

Societal Alignment Frameworks Can Improve LLM Alignment

Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

10.
arXiv (quant-ph) 2026-06-25

Layer codes as partially self-correcting quantum memories

arXiv:2510.06659v2 Announce Type: replace Abstract: We investigate layer codes, a family of three-dimensional stabilizer codes that can achieve optimal scaling of code parameters and a polynomial energy barrier, as candidates for self-correcting quantum memories. First, we introduce two decoding algorithms for layer codes with provable guarantees for local stochastic and adversarial noise, respectively. We then prove that layer codes constitute partially self-correcting quantum memories which outperform previously analyzed models such as the cubic code and the welded solid code. Notably, we argue that partial self-correction without the requirement of efficient decoding is more common than expected, as it arises solely from a diverging energy barrier. This draws a sharp distinction between partially self-correcting systems and partially self-correcting memories. Another novel aspect of our work is an analysis of layer codes constructed from random Calderbank-Shor-Steane codes. We show that these random layer codes have optimal scaling (up to logarithmic corrections) of code parameters and a polynomial energy barrier. Finally, we present numerical studies of their memory times and report behavior consistent with partial self-correction.

11.
arXiv (CS.LG) 2026-06-16

Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary Accuracy

arXiv:2606.16975v1 Announce Type: cross Abstract: In this work, we investigate new activation functions for achieving arbitrary-accuracy Sobolev approximation by fixed-size neural networks. We first show that any function in $W^{2,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy, measured in the $W^{1,\infty}$-norm, by a fixed-size neural network using the Elementary Universal Activation Function ($\mathrm{EUAF}$). To extend this result to $W^{s,\infty}((a,b)^d)$ for $s\in\mathbb{N}$, we introduce a smooth activation $\mathrm{DUAF}_{\infty}$ from the family of Differentiable Universal Activation Functions ($\mathrm{DUAF}_n$). We prove that any function in $W^{s,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy in the $W^{s-1,\infty}$-norm by a fixed-size $\mathrm{DUAF}_{\infty}$-activated network. We further construct sigmoidal variants $\widetilde{\mathrm{DUAF}}_n$ and show that, for every $1\leq s\leq n$, fixed-size $\widetilde{\mathrm{DUAF}}_n$-activated networks still approximate any $f\in W^{s,\infty}((a,b)^d)$ with arbitrary accuracy in the $W^{s-1,\infty}$-norm. In all these results, the width and depth bounds are computed explicitly, and the proposed activations are elementary.

12.
arXiv (CS.CV) 2026-06-16

Learning Fine-Grained Correspondence with Cross-Perspective Perception for Open-Vocabulary 6D Object Pose Estimation

Open-vocabulary 6D object pose estimation empowers robots to manipulate arbitrary unseen objects guided solely by natural language. However, a critical limitation of existing approaches is their reliance on unconstrained global matching strategies. In open-world scenarios, trying to match anchor features against the entire query image space introduces excessive ambiguity, as target features are easily confused with background distractors. To resolve this, we propose Fine-grained Correspondence Pose Estimation (FiCoP), a framework that transitions from noise-prone global matching to spatially-constrained patch-level correspondence. To systematically eliminate background interference, FiCoP first employs an object-centric disentanglement step to isolate the target from macro-level environmental noise. Building upon this localized region, our core methodological innovations are twofold. Firstly, a Cross-Perspective Global Perception (CPGP) module is proposed to fuse dual-view features, establishing structural consensus through explicit context reasoning and text-guided semantic injection. Secondly, we design a Patch Correlation Predictor (PCP) that leverages a patch-to-patch correlation matrix as a structural prior. This generates a precise block-wise association map, acting as a spatial filter to enforce fine-grained, noise-resilient matching. Experiments on the REAL275 and Toyota-Light datasets demonstrate that FiCoP improves Average Recall by 8.0% and 6.1%, respectively, compared to the state-of-the-art method, highlighting its capability to deliver robust and generalized perception for robotic agents operating in complex, unconstrained open-world environments. The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP.

13.
medRxiv (Medicine) 2026-06-22

Building accessible resources to empower communities: the case of the Lupus Mexican Registry

Motivation: Although SLE data in Latin America is increasing, clinical datasets remain difficult to access and interpret, highlighting the need for accessible tools that support data-driven precision medicine, citizen science, and public health initiatives. Results: We developed a user-friendly platform that enables us to explore LupusRGMX data through interactive queries, report generation, statistical modeling, and comprehensive insights. This resource supports community-oriented research, improves the visibility of underrepresented populations in lupus research, and provides a useful tool to enhance data accessibility. Availability and implementation: Developed in R using Shiny and bslib for interactive visualization and interface design. Available at https://github.com/NeuroGenomicsMX/Lupus_App_2.0 and https://lupusrgmx.liigh.unam.mx/shiny/lupus/

14.
medRxiv (Medicine) 2026-06-15

GLLaucoMed: A Secure LLM-Powered Agentic Workflow for Automated Medication Extraction from Free-Text Glaucoma Clinical Notes

Purpose: To evaluate the efficacy of large language models (LLMs) in extracting medication-related information from glaucoma clinical notes in the electronic health record (EHR). Design: Cross-sectional. Subjects: 1,250 subjects in the Bascom Palmer Ophthalmic Repository. Methods: Extracted clinical notes from glaucoma-related encounters between 2014 and 2024 were labeled by two glaucoma specialists with a third serving as an adjudicator. Graders were asked to label current topical medications (CTM), proposed changes to topical medications ({Delta}TM), current oral medications (COM), and proposed changes to oral medications ({Delta}OM) in a structured fashion. The dataset was split into development (10%), validation (10%), and test (80%) sets stratified by clinician. Development and validation sets were used to engineer and refine prompts, and the held-out test set was used for model assessment. Five LLMs (Claude Opus 4.6, DeepSeek-V3.2, GPT 5.2, Grok 4.1, and Qwen3.6-35B-A3B) were accessed via Microsoft Azure AI Foundry within a HIPAA-compliant environment. Inter-grader agreement was assessed with Gwet AC1. LLM performance was initially assessed in a binary fashion with F1 scores, and the degree of text match among positive cases was evaluated using exact match accuracy and Jaccard Index (JI). Main Outcome Measures: F1 score, exact match accuracy, JI. Results: Gwet AC1 for intergrader agreement was 0.799, 0.888, 0.985, and 0.988 for CTM, {Delta}TM, COM, and {Delta}OM, respectively. F1 scores for CTM were 0.985, 0.971, 0.978, 0.968, and 0.970 for Claude, Deepseek, GPT, Grok, and Qwen, respectively; for {Delta}TM: 0.905, 0.826, 0.897, 0.842, 0.855, respectively; for COM: 0.923, 0.887, 0.899, 0.906, 0.894, respectively; for {Delta}OM: 0.958, 0.815, 0.937, 0.835, 0.940, respectively. Among positive cases, range of exact match accuracies for CTM (N=1354) was 0.730- 0.882 and range of JIs was 0.809-0.918. For {Delta}TM (N=404), exact match accuracy range was 0.619-0.780 and JI range was 0.668-0.827. For COM (N=47), exact match accuracy range was 0.766-0.872 and JI range was 0.765-0.870. For {Delta}OM (N=25), exact match accuracy range was 0.583-0.920 and JI range was 0.583-0.922. Conclusions: The GLLaucoMed pipeline demonstrated high performance in extracting and standardizing medication data from unstructured clinical notes, including both current medications and proposed changes. Claude and GPT exhibited the strongest performance.

15.
arXiv (quant-ph) 2026-06-16

Bright Emission from Dark Sources in Hyperbolic Media

arXiv:2606.16071v1 Announce Type: cross Abstract: Hyperbolic media enable ultra-strong light-matter interactions through their extreme field localization and small mode volumes, but low-loss realizations are fundamentally limited to the mid-infrared, owing to the long lifetimes of optical phonons in high-quality crystals. Here we show that bright emitters operating at visible or near-infrared frequencies can be used to generate radiation in this regime by inducing mid-infrared population dynamics, thereby creating a source in the hyperbolic frequency band without a corresponding dipole transition. We demonstrate that even a source with vanishing dipole and higher multipole moments - strictly non-radiating in any isotropic medium - becomes radiatively active in a hyperbolic environment. This enables visible and near-infrared control of light-matter interactions in polaritonic hyperbolic materials, establishing a new low-loss solid-state quantum optics platform.

16.
arXiv (CS.AI) 2026-06-12

A Zero-shot Generalized Graph Anomaly Detection Framework via Node Reconstruction

arXiv:2606.12673v1 Announce Type: cross Abstract: Cross-domain graph anomaly detection (GAD) aims to identify abnormal nodes in unseen target graphs, showing strong potential in real-world applications with heterogeneous graph data. However, existing methods often depend on dataset-specific feature semantics and structural patterns, which limits their ability to generalize across different domains. To address this challenge, we propose AlignGAD, a zero-shot generalized graph anomaly detection framework. Our framework is built upon three key components: a Global Unification Module that aligns heterogeneous node features and normalizes graph signals in the spectral domain; a Clustering Module that constructs cluster-aware graph views to capture group-level abnormal patterns; and a Node Discrepancy Scoring Module that measures reconstruction discrepancy and aggregates anomaly evidence from different graph views. Experiments on multiple real-world datasets demonstrate the effectiveness of AlignGAD under the zero-shot GAD setting.

17.
arXiv (CS.AI) 2026-06-18

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

arXiv:2606.19025v1 Announce Type: cross Abstract: Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance, Mixture-of-Experts (MoEs) architectures have recently achieved state-of-the-art results by decoupling parameter count from computational cost. This efficiency enables training massive models on constrained compute budgets, yet it typically requires the high-speed interconnects of a single datacenter. To overcome these physical limits, recent approaches such as DiLoCo and Photon use low-communication data-parallel methods to enable scaling across geographically distributed, weakly connected data centers. However, these methods suffer from a fundamental inefficiency: they require full model replicas at every site, which imposes prohibitive memory constraints and communication overheads. In this work, we introduce FoMoE, a system that breaks the full-replica paradigm by partitioning expert layers across workers. We demonstrate that FoMoE: (I) reduces communication costs by up to 1.42x over efficient baselines and 45.44x over DDP via partial expert replication in the studied regimes; (II) achieves empirical throughput speedups of up to 1.4x through a novel skip-token mechanism; and (III) shows stable routing in the trained proxy regimes and projects the communication/memory benefits to 100B-scale configurations through system modelling.

18.
arXiv (CS.AI) 2026-06-17

Vibrato Expression Control for Singing Voice Conversion with Improving Independent Control

arXiv:2606.17126v1 Announce Type: cross Abstract: Singing style is a crucial aspect of a natural and expressive singing voice. Singers utilize singing styles to convey the feeling or emotion of the songs. Several works have been proposed to control singing style for making the more expressive singing voice. Recently, VibE-SVC successfully controls vibrato by predicting high-frequency F0 contour. In this paper, we introduce a singing voice conversion framework, called VibE-SVC2, to improve singing style conversion performance and controllability. The model offers control over two types of singing styles: a pitch style and a timbre style. For the pitch style, to resolve the pitch-energy entanglement issue that is unresolved in our previous work, we introduce a novel Energy Style Converter to address remaining style information in the energy contour. In addition, we propose a Zero-shot Pitch Style Converter, which mimics the pitch style of reference audio. To expand the controllability of the model, we propose vibrato rate scaling that is an independent control of vibrato extent, which is unavailable in VibE-SVC. For the timbre style, we extend the model to handle a variety of phonation styles. However, addressing specific styles such as vocal fry poses a challenge, as conventional F0 extraction often fails due to their inherent subharmonic characteristics, which degrades the conversion quality. To address this, we propose a novel Subharmonic Correction algorithm to refine the F0 contour for more natural timbre conversion. Through comprehensive objective and subjective evaluations, we demonstrate that VibE-SVC2 provides fine-grained, independent control over two types of singing styles, outperforming existing methods.

19.
medRxiv (Medicine) 2026-06-15

Repurposing cardiovascular disease risk models to predict incident and co-occurring cardiovascular, cardiometabolic and neurocognitive outcomes.

Background: Cardiovascular disease (CVD), cardiometabolic and neurocognitive conditions share risk factors and frequently co-occur. We evaluated whether four established CVD risk prediction models (QRISK3, PCE, SCORE2, SCORE2-OP) can be repurposed to predict 10-year risk of these conditions and their co-occurrence with CVD. Methods: The models were recalibrated using 20% of the UK Biobank (UKB) and evaluated in the remaining 80%. We performed external validation using data from Clinical Practice Research Datalink (CPRD) Aurum, assessing model discrimination (c-statistics) and calibration (intercept and slope). We used permuted feature importance to determine the influence of each individual predictor in the models. Results: Depending on the model, the c-statistics for incident CVD ranged from 0.71 to 0.74 in the UKB test set (16,137 events). Discrimination was equal to or higher than CVD when evaluated against non-traditional CVD outcomes: 0.74 to 0.77 for heart failure (3,471 events), 0.72 to 0.73 for atrial fibrillation (9,213 events), 0.73 to 0.75 for peripheral arterial disease (1,927 events) and 0.80 to 0.82 for abdominal aortic aneurysm (595 events). For the multimorbidity endpoints, model discrimination ranged from 0.74 for the composite of CVD and T2DM (SCORE2-OP) to 0.83 for the composite of CVD and dementia or Parkinson's disease (QRISK3). When considering the onset of any cardiovascular, cardiometabolic, or neurocognitive outcome discrimination ranged from 0.71 to 0.72. The repurposed models slightly underestimated the predicted risk in the CPRD compared to the UKB: average difference in calibration intercept was at most -0.64. After age and sex, smoking status and systolic blood pressure contributed most to model predictions. Conclusions: Repurposed CVD models can be used to identify 10-year risk of many CVD-related conditions and their multimorbidity. These may be used to support risk-based approaches to prevention and screening. The repurposed models have been made available at: https://repurposed-cvd-risk-models.shinyapps.io/cvd_cmd_dementia_app/ Keywords: Risk prediction; cardiovascular disease; cardiometabolic disease; dementia; disease prevention.

20.
arXiv (CS.CL) 2026-06-11

Improving Cross-Format Robustness in Language Models with Multi-Format Training

Large language models often remain sensitive to answer format: a question solved correctly in one form may fail in another semantically equivalent form. To study this gap, we define cross-format robustness as the extent to which a model answers the same underlying question consistently across formats. We then compare full-format training with FormatMix, which expands only a subset of training items into multiple equivalent formats using either random or targeted selection. Across GLM4 and Llama-3.1, multi-format supervision consistently improves both task performance and cross-format robustness, whereas Multiple-choice question (MCQ)-only supervision alone brings little benefit and can even reduce robustness. We further find that expanding only about 30% of the training set into multiple formats often recovers most of the gain from full-format training, and this effect appears across the model families and sizes we study. These results suggest that format diversity, rather than additional supervision alone, is the key driver of robustness. That lightweight multi-format augmentation is a practical way to make LLMs less sensitive to answer format without changing the base model.

21.
medRxiv (Medicine) 2026-06-24

Clinical care site data integration reveals heterogeneity in EHR phenotyping and healthcare utilization patterns

Objective: Genomic research using electronic health record (EHR)-linked biobanks is influenced by heterogeneity in the clinical settings (care sites) where encounters occur. We developed two methods leveraging care site data: ClinicScan identifies where phenotype documentation occurs, and ClinicWAS identifies specialty utilization patterns associated with a risk factor. Materials and Methods: We extracted care sites for each clinical encounter at an academic medical center and mapped each to a clinical specialty. ClinicScan summarizes the specialty distribution of a user-specified diagnosis; ClinicWAS fits a logistic regression for each care site to identify specialty encounters associated with a user-specified risk factor. We applied ClinicScan to depression to test whether requiring a psychiatry encounter strengthened the association between a polygenic risk score (PRS) and a depression phenotype, and ClinicWAS to a coronary heart disease (CHD) PRS to identify sites enriched for high-risk patients. Results: Across 64,983,257 encounters, 2,544 care sites mapped to 57 specialties. Most depression diagnoses occurred in primary care (30.3%) and psychiatry (19.8%). Requiring a psychiatry encounter strengthened the PRS-phenotype association (OR=1.30, 95% CI 1.26-1.35) versus two or more diagnosis codes alone (OR=1.21, 95% CI 1.19-1.24). CHD ClinicWAS identified 19 associated care sites, including 5 catheterization labs. Men and women with high genetic risk (PRS[≥]95th percentile) underwent catheterization for CHD 3.1 (1.5-4.6) and 4.6 (2.5-6.7) years earlier than normal-risk participants, respectively. Discussion: Care site data capture phenotype heterogeneity that otherwise distorts EHR-based phenotypes and obscures high-risk subpopulations. Conclusion: Clinical care site data are an under-utilized resource in EHR-linked biobanks.

22.
arXiv (CS.CV) 2026-06-19

Single-Stage Hierarchical Rectification for Weakly Supervised Histopathology Segmentation

Existing weakly supervised semantic segmentation (WSSS) methods in computational pathology rely on a multi-stage paradigm: class activation map (CAM) generation, offline pseudo-mask refinement, and fully supervised retraining. While established, this decoupled approach presents fundamental limitations. The multi-stage process not only incurs high computational training costs but also suffers from error propagation: local texture biases in shallow CNN layers generate false-positive artifacts that subsequent refinement steps often fail to correct. To address these persistent challenges through a simple yet highly effective approach, we propose the Single-Stage Hierarchical Rectification (SSHR) framework. Rather than passively refining CAMs post-hoc, our method proactively purifies intermediate feature representations during the forward pass. We introduce a Hierarchical Feature Rectification Module (HFRM) that utilizes deep global semantic context to filter out local anomalies in shallow layers. This mechanism generates high-fidelity activation maps directly within a single training loop. Experiments on the LUAD-HistoSeg and BCSS datasets demonstrate that SSHR outperforms state-of-the-art multi-stage methods. Furthermore, SSHR reduces training duration by 2 to 5 times. This efficiency minimizes computational overhead and accelerates clinical translation for large-scale histopathology workflows. The code is available at: https://github.com/trongduc-nguyen/SSHR

23.
arXiv (CS.CV) 2026-06-17

Training LLMs with Reinforcement Learning over Digital Twin Representations for Reasoning-Intensive Surgical VideoQA

Surgical video question answering requires multi-step reasoning across semantic, spatial, and temporal dimensions. Existing methods architecturally compress videos into discrete token representations and couple visual perception with reasoning. This approach fragments continuous spatial-temporal relationships and has been shown to restrict multi-step reasoning capabilities. We introduce a reinforcement learning (RL) framework that trains large language models (LLMs) to decouple perception from reasoning by operating over digital twin representations constructed from surgical foundation models. Additionally, we introduce hierarchical representations across frame, temporal window, and procedure levels with probabilistic uncertainty estimates. Finally, we propose a novel reward that combines format validation with accuracy assessment through clinical plausibility evaluation and uncertainty-aware calibration for training. To demonstrate the capabilities of this approach, we introduce REAL-Colon-Reason, a colonoscopic benchmark with 2000 question-answer pairs across three complexity levels. We achieve state-of-the-art performance on REAL-Colon-Reason and two existing surgical VideoQA benchmarks REAL-Colon-VQA and EndoVis18-VQA.

24.
arXiv (quant-ph) 2026-06-11

On-Chip Quantum Randomness Amplification

arXiv:2606.12173v1 Announce Type: new Abstract: Randomness amplification, the task of extracting uniform private bits from biased seeds that may be partly known by a malicious third party, is of central importance in cryptography. The highest security in this task is provided by a class of quantum protocols known as device-independent, which however are challenging to integrate into scalable devices. Semi-device-independent (SDI) protocols are a promising alternative that guarantees security under few natural assumptions, such as bounds on the amount of energy used by the devices. Here, we provide the first demonstration of SDI randomness amplification on an integrated silicon photonic chip, achieving a throughput rate of 20 Mbps suitable for practical applications. This rate is achieved through a novel technique for SDI entropy certification, which delivers strictly tighter von Neumann entropy bounds compared to existing methods and remains valid even if the preparation and measurement devices share quantum correlations. Overall, the methods developed in this work enable the integration of SDI technology into portable telecom devices, opening up a new generation of quantum cryptographic hardware.

25.
arXiv (CS.CL) 2026-06-25

Memory Makes the Difference: Evaluating How Different Memory Roles Shape Conversational Agents

Prior research on memory mechanism in RAG-based conversational system has emphasized how memory is stored and retrieved. However, far less is known about how memories with different functional roles influence response quality. Specifically, how they shape an agent's responses under varying conversational contexts and whether they lead to substantively different response behaviors. Existing evaluations in conversational system are also largely reference-based, insufficiently capturing the nuances in responses that may address users' preferences differently. In this work, we probe the impact of different memory types in shaping agents' responses. We present a fine-grained taxonomy of conversational memory, classify retrieved memories into different role types, and design a user-centric evaluation framework that simulates user perspectives. Through comparative experiments on long-term datasets and frontier LLMs, our analysis reveal many differentiated effects of memories: e.g., clarifying memory improves responses' factual accuracy and constraint awareness, making them more correct and personalized; irrelevant memory reduces topic relevance and degrades constraint awareness. Despite the power of frontier LLMs, these findings shed light on how different memory types can be leveraged to produce more personalized responses and inspire further research in this direction.