Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

Automated ultrasound doppler angle estimation using deep learning

arXiv:2508.04243v2 Announce Type: replace-cross Abstract: Angle estimation is an important step in the Doppler ultrasound clinical workflow to measure blood velocity. It is widely recognized that incorrect angle estimation is a leading cause of error in Doppler-based blood velocity measurements. In this paper, we propose a deep learning-based approach for automated Doppler angle estimation. The approach was developed using 2100 human carotid ultrasound images including image augmentation. Five pre-trained models were used to extract images features, and these features were passed to a custom shallow network for Doppler angle estimation. Independently, measurements were obtained by a human observer reviewing the images for comparison. The mean absolute error (MAE) between the automated and manual angle estimates ranged from 3.9{\deg} to 9.4{\deg} for the models evaluated. Furthermore, the MAE for the best performing model was less than the acceptable clinical Doppler angle error threshold thus avoiding misclassification of normal velocity values as a stenosis. The results demonstrate potential for applying a deep-learning based technique for automated ultrasound Doppler angle estimation. Such a technique could potentially be implemented within the imaging software on commercial ultrasound scanners.

02.
arXiv (quant-ph) 2026-06-16

Electronic Band Structure of Silicon Determined via a Variational Adiabatic Eigensolver: Theory and Experiment

arXiv:2606.16604v1 Announce Type: new Abstract: This work addresses the critical challenge of excited-state preparation for semiconductor band structure calculations. We introduce a variational adiabatic eigensolver (VAE) protocol that combines adiabatic evolution with variational optimization to prepare high-fidelity eigenstates on noisy intermediate-scale quantum (NISQ) devices. Applying a momentum-space truncation, we accurately compute the electronic band structure of silicon – an idealized infinite periodic system – using only a modest number of qubits. Our approach employs multi-qubit parameterized circuits and a phase-based loss function, overcoming limitations of conventional methods. These limitations include the circuit-construction difficulty in traditional adiabatic approaches and the reduced accuracy of variational quantum eigensolvers for excited states. Through rigorous numerical simulation and experimental implementation on a superconducting quantum processor, we successfully prepare silicon's valence-band and conduction-band eigenstates. Single-shot readout yields state fidelities exceeding 96%, and the measured energy expectations agree with theoretical band energies within 0.5 eV. Further refinement via single-frequency oscillation fitting reduces the energy deviation to below 0.01 eV. This framework provides a robust and practical pathway for precisely determining electronic structures in quantum materials.

03.
arXiv (CS.LG) 2026-06-17

Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

arXiv:2602.23116v3 Announce Type: replace Abstract: We consider the problem of regularized best-response max-regret minimization in online RLHF under general preferences and bandit feedback. While various regularizers are utilized to robustify alignment, known polylogarithmic regret guarantees remain heavily specific to KL. To investigate whether such fast rates extend beyond KL, we adopt the Generalized Bilinear Preference Model (GBPM) – capturing intransitive preferences over $d$-dimensional item-wise features via a rank-$2r$ skew-symmetric matrix – to isolate the impact of generic regularization. Crucially, under GBPM, we prove that the dual gap of any greedy policy is bounded by the squared estimation error, derived using only strong convexity and skew-symmetry. Under a feature coverage assumption, we establish a generic polylogarithmic regret of $\tilde{\mathcal{O}}(\eta d^4 C_{\min}^{-1} (\log T)^2 \wedge d^2 C_{\min}^{-1/2} \sqrt{T})$ with Greedy Sampling, and a dimension-wise improved regret (for well-conditioned arm-sets) of $\tilde{\mathcal{O}}(C_{\min}^{-2} \sqrt{\eta r T} \wedge r^{1/3} C_{\min}^{-4/3} T^{2/3})$ with Explore-Then-Commit, where $\eta^{-1}$ is the regularization coefficient, $T$ is the time horizon, and $C_{\min}$ is an arm-set dependent quantity. This demonstrates that ``fast'' regrets are not KL-specific, but rather a fundamental consequence of generic strongly convex geometry.

04.
medRxiv (Medicine) 2026-06-18

Cardiac rhythm development: A wearable device index of risk for physical and mental illness in adolescence

Objective. The autonomic nervous system, which regulates cardiac rhythm, undergoes pronounced maturation across adolescence. How cardiac rhythm develops over this period, however, and whether individual differences in its development forecast mental and physical illness, remain open questions. We used three waves of Fitbit data from the Adolescent Brain Cognitive Development (ABCD) Study to characterize the developmental trajectory of the cardiac rhythm and to test whether variation in that trajectory predicts onset of psychopathology and cardiometabolic disease. Methods. 8,301 adolescents contributed 242,811 valid Fitbit wear days across Waves 2 (Mage=12), 4 (Mage=14), and 6 (Mage=16). Cosinor mixed-effects models yielded three rhythm parameters per session: mesor (24-hour mean), amplitude (diurnal swing), and acrophase (peak timing). We first characterized age- and sex-specific trajectories, cross-wave stability, and factors shaping the rhythm. We then used parallel-process latent growth models to test whether within-person changes in rhythm tracked symptom trajectories, and hierarchical logistic models to test whether rhythm parameters predicted the first clinical onset of psychopathology and of obesity and hypertension. Results. The cardiac rhythm changed substantially across adolescence: mesor decreased, amplitude flattened, and acrophase shifted later. Within-person change in the rhythm tracked change in blood pressure, BMI, and trajectories of depression and ADHD symptoms. Higher mesor predicted incident onset of all five outcomes controlling for demographics, baseline symptoms, and behavior (ORs 1.36-1.54); amplitude, acrophase, and rhythm instability conferred additional risk. Conclusions. The 24-hour cardiac rhythm is a passively measurable substrate of adolescent autonomic development that indexes transdiagnostic risk for psychiatric and cardiometabolic illness.

05.
arXiv (CS.CV) 2026-06-16

Instance-Aware Knowledge Distillation for Semi-Supervised Learning of an On-Board Multi-Task Dense Prediction Model for Collision Avoidance System

Collision avoidance systems have evolved toward camera-based deep learning approaches for driving scene understanding. However, deployment in edge environments such as country clubs is constrained by limited computational resources and unreliable communication infrastructure. Moreover, constructing large-scale datasets for the target domain involves substantial annotation cost. To address these limitations, we propose an instance-aware knowledge distillation framework for semi-supervised learning. Specifically, we generate pseudo labels that mitigate teacher bias by leveraging domain priors from the teacher and instance-centric knowledge from foundation models. The trained lightweight student is deployed in the proposed collision avoidance system and performs multiple dense prediction tasks in real-time. The system detects frontal obstacles and encodes their spatial information into controller area network messages for automated guided vehicle operation. To achieve this, we construct a large-scale country club dataset and perform field validation of the proposed system. Experimental results demonstrate that the student outperforms the large teacher in instance segmentation while mitigating performance degradation in monocular depth estimation. Compared with the teacher, the student reduces FLOPs by 22.68$\times$ and parameters by 14.33$\times$, achieving 6.46 FPS on a low-cost edge device.

06.
arXiv (CS.AI) 2026-06-11

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

arXiv:2606.11671v1 Announce Type: cross Abstract: Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19–20 out of 20 malicious skills across rounds.

07.
arXiv (math.PR) 2026-06-15

Lehner's operator norm formulas, semidefinite programming, and spiked matrix models

arXiv:2606.14687v1 Announce Type: new Abstract: Lehner (1999) derived elegant formulas for the operator norm $\|\mathfrak{X}\|$ of operators of the form $\mathfrak{X} = \mathbf{A}_0 \otimes \mathfrak{1} + \sum_{i = 1}^n \mathbf{A}_i \otimes \mathfrak{m}_i$, also easily generalized to the spectral edge $\lambda_{\max}(\mathfrak{X})$, in terms of nonlinear optimization problems over positive definite matrices. Here the $\mathbf{A}_i$ are finite-dimensional Hermitian matrices, the $\mathfrak{m}_i$ are either free semicircular or free Rademacher families of operators, and $\mathfrak{1}$ is the identity operator. We first show that both of Lehner's nonlinear optimizations can be rewritten as linear semidefinite programs (SDPs), even in the Rademacher case where Lehner's optimization is not itself convex. We give the primal and dual forms of these SDPs, derive the complementary slackness relations and consequences thereof, and propose that the SDPs are more stable and accurate than the iterative numerical scheme proposed in Lehner's original work. We then apply the SDPs from the semicircular case to spiked matrix models, studied recently via Lehner's formula by Bandeira, Cipolloni, Schröder, and van Handel (2024). We give a new proof of the Baik–Ben Arous–Péché (BBP) transition they establish in models with isotropic (but possibly correlated) Gaussian noise by constructing feasible variables for the associated primal and dual SDPs. Combining our construction with a sensitivity interpretation of optimal dual variables, we study the fluctuations of leading eigenvectors of such models. We conjecture and give numerical evidence that these fluctuations are Gaussian but anisotropic and non-universal, and that their covariance may be computed in terms of the optimizer of the dual of Lehner's formula, which in turn is approximately the leading eigenmatrix of a completely positive operator associated to the covariance of the noise model.

08.
arXiv (CS.CV) 2026-06-12

MinhwaNet: Faithful but Insufficient Object Grounding in Korean Folk Painting

Authors:

Korean folk painting (minhwa) is built from a small vocabulary of auspicious symbols, a tiger for protection, a pair of birds for marital harmony, a peony for wealth, that recur across many of its painted genres. This suggests an obvious computational approach, identify which symbols appear in a painting and read the genre from the inventory. Working with a public corpus that pairs whole paintings, eight-field bilingual curatorial captions, and a separate set of expert object crops, we find that this approach does not work. A model given only a list of which symbols a painting contains predicts the genre far worse than a model that fuses the image with the curatorial text, and forcing the genre representation to be object-grounded actively hurts accuracy. The visual evidence on which the genre prediction rests is nonetheless localized and inspectable. A leakage-safe object evidence map projected from a part-level detector is spatially faithful to where curators isolated symbolic objects and to a patch-based surrogate's own gradient saliency. We name this configuration a faithful-but-insufficient dissociation. The part-level explanation is honest about what the part-level model sees, yet the genre target turns on how symbols are arranged rather than on which ones appear. The same lens separates a content label that survives transfer to held-out source institutions, genre, from a style label that does not, era, a prediction we confirm on two further labels in the corpus. We release the multimodal system, a worked-example reading of one painting's evidence map against its catalogue, and a set of evaluation cautions that recur in long-tailed heritage collections.

09.
arXiv (CS.CV) 2026-06-16

HorusEye: Language as Dynamic Attention for Emergency Visual Analysis

Authors:

We introduce HorusEye, Language as Dynamic Attention for Emergency Visual Analysis. Our investigation followed five stages. The first one is benchmarking RefCOCO-Degraded, a dataset of 15,244 images (3,811 base images x 4 conditions: Clean, Fog, Smoke and Thermal) with systematic visual degradation. Through four research questions, we evaluate multiple VLMs (Gemini, Qwen2-VL, BLIP-2, LLaVA, Kosmos-2) across visual grounding the second stage, language feedback recovery the third one, health VQA tasks the fourth, and hallucination analysis the final stage. Our key finding is that language feedback effectiveness is model-dependent: Gemini achieves +47.3% improvement in thermal conditions through iterative language feedback, while Qwen2-VL shows -5.1% degradation under the same protocol. We also identify the 'Thermal Paradox' where cropping strategies that improve RGB performance catastrophically fail in thermal imagery. Furthermore, BLIP-2 uniquely hallucinates more under degradation, making it unsuitable for emergency deployment

10.
arXiv (quant-ph) 2026-06-12

Characterizing the functional role of quantum coherence in energy transfer

arXiv:2606.13404v1 Announce Type: new Abstract: Quantum coherence is understood to play a role in excitation energy transfer in open quantum systems, yet a quantitative approach to assessing its influence on the transfer process is still missing. Using Nakajima-Zwanzig projection operators, we derive a general memory kernel identity that enables us to characterize and quantify the impact of coherence in the eigenenergy basis on a generalized rate of energy transfer. Applying our approach to the electronic dynamics of a dimer coupled to a structured phonon bath, we demonstrate how quantum coherence acts to modulate energy transfer.

11.
arXiv (CS.CL) 2026-06-12

Zero-source LLM Hallucination Detection with Human-like Criteria Probing

Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source constraint, where no model internals or external references are available, and detection must rely solely on the textual query-answer pair. In this paper, we propose Human-like Criteria Probing for Hallucination Detection (HCPD), a paradigm that emulates the multi-faceted reasoning of human evaluators. Its core is a Human-like Criteria Probing (HCP) mechanism, in which a LLM agent adaptively decomposes its judgment into a weighted set of interpretable criteria and aggregates criterion-specific scores into a final truthfulness measure. To achieve this adaptive capability, we introduce a reward-based alignment scheme using only weak supervision from semantic consistency. At inference, we employ a multi-sampling aggregation strategy to ensure robust decisions while preserving full interpretability. We further provide theoretical analysis supporting the reliability of our approach. Extensive experiments show that HCPD consistently outperforms state-of-the-art baselines, offering an effective and explainable solution for zero-source hallucination detection. Code is available at https://github.com/TRISKEL10N/HCPD.

12.
arXiv (CS.LG) 2026-06-18

Do as the Romans Do: Learning Universal Behaviors from Heterogeneous Agents

arXiv:2606.18537v1 Announce Type: new Abstract: Humans often acquire new skills by observing others, since observed behaviors implicitly reveal how to act in an environment. However, observations drawn from a heterogeneous population introduce conflicting behavioral signals, making it difficult to determine which behaviors are worth imitating. We address this challenge with General Reward Inference and Disentanglement (GRID), a social learning method that extracts universally useful behaviors from a heterogeneous population of demonstrators pursuing different goals. GRID decomposes per-agent reward functions into a general reward, capturing behaviors shared across all agents, and specific rewards, capturing individual preferences and objectives. Training exclusively on the general reward provides a new paradigm of generalist pretraining. It yields a generalist agent that internalizes universal environmental competencies, such as safety and basic task proficiency, without the mode-averaging bias that afflicts standard learning from demonstration techniques. This generalist serves as a superior prior for fine-tuning to downstream tasks, including preferences unseen during training. Experiments across a synthetic basis function decomposition, multi-agent Craftax, and a continuous autonomous driving simulator (Highway-Env) confirm that GRID successfully disentangles reward structure in a semantically meaningful way, outperforms standard learning from demonstration baselines, and enables more efficient and stable specialization.

13.
bioRxiv (Bioinfo) 2026-06-18

Structure-Based Immunoinformatics Design of a CTB-Adjuvanted Multi-Epitope Mucosal Vaccine Against Helicobacter pylori

Background: Helicobacter pylori coloniz the gastric mucosa of nearly half of the global population and is classified as a Group I carcinogen by the World Health Organization due to its strong association with gastric cancer. The growing prevalence of antibiotic-resistant H. pylori strains significantly compromises current therapeutic strategies, emphasizing the urgent need for effective prophylactic approaches. Research design and methods; In this study, a novel multi-epitope vaccine was designed targeting H. pylori, incorporating epitopes from four key virulence proteins: BabB, SabB, SabA, and VacA. Using an immunoinformatics-guided structural vaccinology approach, B- and T-cell epitopes were predicted, prioritized based on immunogenicity, conservation, population coverage, and non-homology to human proteins, and assembled into the final vaccine construct. To enhance immunogenicity and specifically stimulate mucosal immune responses, the cholera toxin B subunit (CTB) was fused at the N-terminal via an EAAAK linker, a novel application in H. pylori multi-epitope vaccines. The PADRE universal epitope and additional linkers were incorporated to optimize epitope presentation and helper T-cell activation. Results: Comprehensive evaluations of physicochemical, antigenic, allergenic, and toxic properties were conducted, followed by secondary and tertiary structure modeling, refinement, and validation. Conformational B-cell epitopes were mapped, and molecular docking, binding affinity analysis, energy minimization, and molecular dynamics simulations confirmed structural stability and receptor interactions. Codon optimization and in silico cloning predicted efficient expression in Escherichia coli, while immune simulations suggested robust humoral and cellular responses. Conclusions: This study presents a promising multi-epitope vaccine candidate against H. pylori, offering a rational framework for future experimental validation and potential clinical application.

15.
arXiv (CS.AI) 2026-06-19

REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk

arXiv:2606.19522v1 Announce Type: new Abstract: The retina offers a noninvasive window into neurodegenerative disease, capturing subtle structural patterns associated with a risk of future cognitive decline. Vision-language alignment frameworks such as REVEAL have shown that pairing retinal fundus images with structured clinical risk narratives improves early prediction of Alzheimer's disease (AD). A key design choice in these approaches is the use of phenotypic grouping, where individuals with similar risk profiles are treated as multi-positive pairs during contrastive learning. However, existing methods operationalize phenotypic similarity as a discrete construct, relying on hard group assignments that impose rigid supervision and decouple group formation from representation learning. We propose a continuous formulation of phenotypic structure within contrastive learning. Rather than assigning samples to fixed clusters, we model inter-subject similarity as a differentiable weighting function derived from intra-modality embedding similarities in both retinal images and risk profiles. These weights define soft multi-positive relationships through a continuous aggregation operator, enabling graded supervision that reflects the spectrum nature of disease risk. We further introduce a soft-target contrastive objective that jointly learns cross-modal alignment and phenotypic structure in an end-to-end manner. Evaluated on UK Biobank retinal imaging data for incident AD prediction, the proposed framework consistently outperforms discrete group-based contrastive learning and standard vision-language baselines. By treating phenotypic similarity as a learnable, continuous signal rather than a fixed grouping rule, our approach provides a principled and robust foundation for population-scale neurodegenerative risk modeling from multi-modal retinal and clinical data.

16.
arXiv (CS.LG) 2026-06-12

Viral Proteins Reveal Geometry of Protein Language Models

arXiv:2606.12609v1 Announce Type: new Abstract: Protein language models are trained on highly imbalanced datasets, raising the question of how they represent underrepresented biological sequences. Using viral proteins as a case study across ESM model families, we identify a dominant nativeness axis in embedding space, aligned with masked reconstruction perplexity, that orders sequences from well-modeled cellular proteins through viral proteins to shuffled and random sequences. Scaling contracts this axis unevenly across viral families. Despite this, protein language model embeddings retain viral-specific signal: viral proteins remain linearly separable beyond zero-shot perplexity and shallow sequence features. Together, these results suggest that pLM representations are structured by a general notion of nativeness while preserving information specific to distinct biological groups.

17.
arXiv (CS.CV) 2026-06-16

Pixel-TTS: Image based Text Rendering for Robust Text-to-Speech

Recent advances in pixel-based text modeling show that representing text as images enables models to exploit visual cues for language understanding. Grounding text in its visual form allows structurally similar characters with different Unicode encodings to produce similar embeddings, benefiting cross-lingual and zero-shot scenarios. Conventional text-based approaches treat each character independently, limiting generalization to unseen characters and requiring embedding expansion during cross-lingual adaptation. We propose Pixel-TTS, the first framework for visually grounded speech synthesis. It renders text as images and projects them through a 2D convolutional layer to generate embeddings. This design eliminates embedding matrix expansion during fine-tuning while improving robustness to unseen characters and orthographic variations. Extensive experiments show Pixel-TTS achieves competitive performance with strong baselines, faster convergence and robust zero-shot generalization.

18.
medRxiv (Medicine) 2026-06-10

Healthy Heart Actions Right Time (HHART): Co-design priorities to connect Aboriginal and Torres Strait Islander community and clinic activities for healthy hearts

Aim: Healthy Heart Actions Right Time (HHART) is a multi-phased research project that seeks to identify, implement and evaluate strategies to connect community and clinical activities to reduce the burden of heart disease for Aboriginal and Torres Strait Islander people. The aim in Phase One was to identify priority activities for two participating services. Background: The ongoing effects of colonisation drive a disproportionate burden of heart disease for Aboriginal and Torres Strait Islander people. Clinical and community groups both have established strengths in reducing the risk of heart disease, but these are not always well connected. Methods: Using a case study methodology in two locations we partnered in a 12-month co-design process to identify priority activities to connect clinical and community activities. Findings: Three priorities emerged from the Phase One co-design process: (i) community-led gardening as a strategy to promote heart health through connection and healthy lifestyles; (ii) community days to increase engagement in heart checks and strengthen community-clinic relationship; and (iii) clinic-led development of culturally relevant education resources to promote clinician confidence and community heart health knowledge.

19.
arXiv (CS.LG) 2026-06-17

Toward Controllable Catalyst Inverse Design via Large-Scale Autoregressive Pretraining

arXiv:2606.17445v1 Announce Type: new Abstract: Inverse design of heterogeneous catalysts remains challenging because catalyst surfaces exhibit substantial structural complexity with coupled surface-adsorbate interactions across a vast chemical space that is difficult to explore efficiently through conventional screening alone. Although machine learning-based high-throughput screening has accelerated catalyst discovery, its efficiency inevitably declines as the search space grows, motivating the development of generative models that can directly construct catalysts with target properties. Here, we present a conditional catalyst generative model based on the Generative Pretrained Transformer architecture with a numerical embedding layer that enables the generation of catalyst structures conditioned on both categorical and continuous properties within a single autoregressive framework. The model was pretrained on 133 million catalyst structures and subsequently fine-tuned on approximately 460,000 optimized structures with associated categorical properties and binding energies for conditional generation. The resulting model achieved 98% structural validity, 95% optimization validity, and high categorical condition fidelity, with a 93 % joint match rate for adsorbate type and composition. For binding energy conditioning, the match rate of approximately 20% represents a four-fold improvement over the baseline training distribution, and the generated distributions shift systematically toward the target values, enabling a 1.5 to 4-fold improvement in screening efficiency for reaction-targeted catalyst discovery without additional fine-tuning. These results show that large-scale autoregressive pre-training, combined with explicit property conditioning, provides a practical route toward controllable catalyst generation and accelerated catalysts discovery.

20.
arXiv (CS.LG) 2026-06-18

Quantifying and Auditing LLM Evaluation via Positive–Unlabeled Learning

arXiv:2606.19057v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM–as–a–Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive–unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human–verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human–consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM–as–a–judge pipelines.

21.
arXiv (math.PR) 2026-06-18

Evolution of Conditional Entropy for Diffusion Dynamics on Graphs

arXiv:2510.19441v2 Announce Type: replace-cross Abstract: The modeling of diffusion processes on graphs is the basis for many network science and machine learning approaches. Entropic measures of network-based diffusion have recently been employed to investigate the reversibility of these processes and the diversity of the modeled systems. While results about their steady state are well-known, very few exact results about their finite-time evolution exist. Here, we introduce the conditional entropy of heat diffusion in graphs, and outline a mathematical framework that contextualizes diffusion and conditional entropy within the theories of continuous-time Markov chains and information theory. In particular, we highlight that this entropic measure satisfies an information-theoretical version of the second law of thermodynamics, thereby providing a parallelism between diffusion dynamics on networks and their physical counterparts. Furthermore, we obtain explicit results for its evolution on complete, path, and circulant graphs, as well as a mean-field approximation for Erdös-Rényi graphs. We also obtain asymptotic results for general networks and provide bounds for the evolution of conditional entropy. Finally, we experimentally demonstrate several properties of conditional entropy for diffusion over random graphs, such as the Watts-Strogatz model.

22.
arXiv (CS.LG) 2026-06-16

GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

arXiv:2606.16771v1 Announce Type: new Abstract: As LLMs advance, post-training reinforcement learning (RL) increasingly relies on multi-dimensional rewards to cultivate comprehensive capabilities. This shift demands new algorithms capable of optimizing diverse and potentially competing objectives simultaneously. To address this, existing methods such as Group reward-Decoupled Policy Optimization (GDPO) decompose the overall score into independent reward groups, then compute the RL loss separately within each group. However, this strategy still encounters multi-reward conflicts: a single rollout can yield positive advantages on certain reward dimensions but negative ones on others, causing opposing signals to cancel each other out during aggregation, further hindering RL training efficiency. Inspired by Dynamic sAmpling Policy Optimization (DAPO), which improves RL training efficiency by filtering out ineffective rollouts with near-zero advantages, we propose Group-Dynamic reward-Decoupled Policy Optimization (GD$^2$PO). Specifically, GD$^2$PO employs a conflict-aware filtering mechanism to mask out rollouts suffering from severe reward-wise disagreement. By preventing conflicting signals from canceling each other out, this masking strategy preserves and enhances the magnitude of effective RL advantages, thereby significantly accelerating learning efficiency. Furthermore, we introduce query-level reweighting to dynamically adjust the update intensity of each query based on its overall reward consensus. Experiments on various multi-reward scenarios, including tool calling and human preference alignment, demonstrate that GD$^2$PO consistently and significantly outperforms existing baselines. The code is available at https://github.com/Qwen-Applications/GD2PO.

23.
arXiv (quant-ph) 2026-06-16

Worst-case depth hierarchy for shallow quantum circuits

arXiv:2606.16425v1 Announce Type: new Abstract: Circuit depth is a central resource in complexity theory. While bounded-depth classical circuits admit well-understood hierarchy theorems, the internal structure of constant-depth quantum computation remains comparatively unexplored. We prove an explicit depth hierarchy theorem for $\mathsf{QNC}^0$. For each $d\ge 12$, we construct a family of two-round interactive problems on which no depth-$(d-1)$ quantum circuit can achieve near-perfect success, regardless of gate set, circuit size, or ancillary qubits. In contrast, we prove that our construction admits realizations by simple bounded fan-in quantum circuits of depth larger than $d$ by a small constant factor. Moreover, all bounded fan-in classical circuits of sublogarithmic depth (in the input size) fail to achieve perfect success on these tasks for every $d$, yielding a hierarchy of problems that show unconditional quantum advantage of $\mathsf{QNC}^0$ over $\mathsf{NC}^0$. A key obstacle is the scarcity of lower bound techniques for quantum circuits. To address this, we develop methods to analyze how depth affects a circuit's ability to realize nonlocal correlations amongst its output qubits in a fine-grained manner. Our approach exploits the correspondence between constraint systems and nonlocal games, translating group-theoretic constructions into rigid operator-valued constraint systems and then into non-local games. In particular, we construct constraint systems whose unique faithful operator-valued solutions require every perfect strategy, and every near-perfect strategy to a fixed precision, to implement multi-controlled phase operations. This reduces to a nonlocal unitary-synthesis problem, yielding depth lower bounds for both shallow quantum and classical circuits. These results show that increasing depth strictly increases computational power within $\mathsf{QNC}^0$, establishing a genuinely quantum hierarchy.

24.
arXiv (CS.CV) 2026-06-11

FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection

Transfer learning from large-scale RGB foundation models to infrared (IR) imagery through knowledge distillation (KD) remains challenging due to fundamental differences in image formation physics. We investigate the spectral structure of the RGB–IR modality gap and observe that feature divergence is not uniform across spatial frequencies: low-frequency components (shape, layout) show greater cross-modal alignment than high-frequency components (texture, fine edges), which reflect modality-specific characteristics. Based on this analysis, we propose FreqKD, a frequency-decoupled distillation framework that applies asymmetric supervision adapted to each band's cross-modal consistency. The method employs strict mean squared error (MSE) on the low-frequency band to preserve shared structural information and a relaxed log-MSE loss (weighted at 0.1) on the high-frequency band to provide edge guidance while tolerating texture differences. Spectral divergence analysis on 500 paired samples shows that high-frequency divergence exceeds low-frequency divergence by a factor of 2.4x on average across all analysed transformer layers. On KAIST multispectral pedestrian detection, FreqKD achieves 64.1 mAP50, improving 2.4 points over the DINOv2 baseline. The learned representation transfers across datasets (FLIR ADAS, +2.1 mAP50), tasks (MFNet segmentation, +1.85 mean intersection-over-union), and architectures (ResNet-50, +1.0 mAP50). Code is available at: https://anonymous.4open.science/r/freq_decoupled_kd-5E5A

25.
arXiv (CS.CL) 2026-06-19

Vero: An Open RL Recipe for General Visual Reasoning

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) suggest that broad visual reasoning is within reach, yet their closed data and reinforcement learning (RL) pipelines make their gains difficult to study, reproduce, or extend. We introduce Vero, a family of fully open VLMs that match or exceed existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answers. Across VeroEval, our 30-benchmark suite, Vero-600K outperforms existing RL datasets under controlled comparisons. Applied to five starting models, Vero variants gain 2.9-5.4 points on average over their initial models. Notably, Vero-Qwen3I-8B, trained on the Instruct model, surpasses Qwen3-VL-8B-Thinking by 3.8 points on average without additional distillation. Systematic ablations reveal that different task categories elicit distinct reasoning patterns and that broad gains depend on learning them jointly rather than in isolation. All data, code, and models are publicly available.