Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-11

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have emerged as a popular paradigm to inject procedural knowledge into LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. While text compression techniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead of procedural knowledge, making them insufficient for skill compression. In this paper, we argue that an effective skill compression method should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers of soft tokens that not only improve the efficiency of LLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preserving task performance better than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .

02.
arXiv (CS.AI) 2026-06-16

STRIDE: Strategic Trajectory Reasoning via Discriminative Estimation for Verifiable Reinforcement Learning

arXiv:2606.15866v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become an effective post-training paradigm for improving the reasoning abilities of large language models. However, existing RLVR methods typically rely on final-answer correctness to assign trajectory-level rewards, providing sparse supervision and treating all tokens uniformly regardless of their actual contribution to reasoning. Although recent studies introduce intermediate signals such as process rewards, high-entropy tokens, and semantic uncertainty, these signals are often not inherently verifiable and may fail to distinguish beneficial strategic patterns from harmful ones. To address this limitation, we propose STRIDE (Strategic Trajectory Reasoning with Discriminative Estimation), a fine-grained RLVR framework that derives strategic reasoning supervision from verifiable outcomes. STRIDE contrasts successful and failed trajectories within each response group to estimate the outcome-discriminative preference of each $n$-gram strategic pattern, and further combines this signal with reasoning saliency entropy to identify decision-relevant strategic patterns. These patterns are assigned differentiated advantage values during RL optimization, enabling more precise credit assignment while preserving the verifiability of RLVR. Extensive experiments demonstrate that STRIDE consistently improves reasoning performance across diverse models, tasks, and extended settings, including VLMs and agent-based systems.

03.
arXiv (quant-ph) 2026-06-16

Complete Relational Description of Spin in a Quantum Background

arXiv:2606.15873v1 Announce Type: new Abstract: The standard description of the state of a spin in quantum mechanics presupposes externally fixed directions – a classical background. Can a spin be fully described instead in relation to other quantum mechanical systems? Poulin suggested twenty years ago group averaging over rotations the joint state of a fundamental spin and a reference spin with large angular momentum which, however, yields a classical bit in a probabilistic mixture. We revisit this idea and show that when the quantum reference system is augmented to two large spins, the standard quantum mechanical description of a spin is recovered in the limit of large quantum numbers for the reference system.

04.
arXiv (CS.CV) 2026-06-16

LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination

Vision-Language-Action (VLA) models achieve strong performance on standard manipulation benchmarks, but most evaluations assume that task-relevant objects are fully visible. This assumption often fails in realistic settings, where occlusion makes manipulation partially observable. In this paper, we study scene-induced occlusion as a fundamental challenge for VLA models and introduce LIBERO-Occ, an occlusion-oriented extension of LIBERO. Experiments show that state-of-the-art VLAs suffer substantial performance degradation under occlusion. To address this issue, we propose Viewpoint Imagination (VIM), which generates a complementary view from an occluded primary observation and conditions action prediction on both observed and imagined evidence. VIM improves robustness across task suites, occlusion types, and severity levels without requiring additional cameras at deployment time, suggesting that viewpoint imagination is an promising mechanism for perception completion in partially observable manipulation. Our benchmark and corresponding code are available at: \href{https://github.com/litsh/Libero-Occ}{https://github.com/litsh/Libero-Occ}.

05.
arXiv (quant-ph) 2026-06-16

Light-induced nonadiabatic dissipative quantum dynamics of the Na2 molecule

arXiv:2606.15292v1 Announce Type: new Abstract: Strong light-matter coupling between molecules and optical or plasmonic cavity modes has emerged as a promising platform for advancing photonics, materials science, and chemistry. However, optical cavities and plasmonic resonators in particular are inherently lossy systems characterized by finite photon lifetimes. Accurate theoretical descriptions of molecular dynamics under strong coupling therefore require a proper treatment of cavity losses. In this work, we compare three theoretical approaches for modeling dissipative molecule-cavity dynamics within a realistic parameter regime: the Lindblad master equation, the stochastic Schrödinger equation, and the non-Hermitian Schrödinger equation. As an example, we consider the two lowest energy state of Na2 molecule coupled to a cavity mode and analyze the time evolution of the excited-state population and the mean photon number. Our results demonstrate that the stochastic Schrödinger equation provides an accurate and computationally efficient alternative to the Lindblad master equation, while the non-Hermitian Schrödinger approach is found to be applicable only within a limited range of conditions. Furthermore, we show that inclusion of molecular rotation leads to rotational-vibrational-photonic coupling and gives rise to pronounced nonadiabatic dynamics through light-induced conical intersections. These findings highlight the importance of both dissipation and rotational degrees of freedom for a realistic description of molecular dynamics in strongly coupled molecule-cavity systems.

06.
arXiv (CS.CV) 2026-06-19

ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

Imperfectly supervised video polyp segmentation (VPS) aims to learn dense, temporally consistent masks from inexpensive supervision, including weak annotations (points, scribbles) and semi-supervision with few densely labeled frames. This setting is clinically valuable but challenging due to weak contrast, ambiguous boundaries, motion blur, and specular highlights, compounded by sparse pixel-level guidance. While SAM2 can generate dense masks from sparse inputs, direct pseudo-labeling often yields geometry-degraded masks with boundary leakage, underutilizes temporal consistency, and ignores reliability. To address these issues, we propose ARTEMIS, a unified framework for imperfectly supervised VPS driven by agent-guided reliability-aware temporal mask evolution. ARTEMIS initializes coarse masks from available supervision: SAM2 converts points/scribbles, while dense labels serve as reliable anchors. A debate-and-judge vision-language agent selects reliable temporal anchors under weak supervision, which are propagated bidirectionally with SAM2 to refine unreliable or unlabeled frames. Finally, ARTEMIS trains the segmenter using temporal reliability-aware robust learning, incorporating reliability-guided reference selection, a Reference Prototype Transport Module, and reliability-aware robust loss. These components assess mask reliability, evolve anchors over time, transport target identity across frames, and down-weight noisy supervision instead of discarding difficult samples. Experiments on SUN-SEG and CVC-ClinicDB-612 under scribble, point, and limited-label settings demonstrate that ARTEMIS achieves state-of-the-art performance. Code will be released at https://github.com/wangtong627/ARTEMIS.

07.
arXiv (CS.CL) 2026-06-19

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.

08.
arXiv (CS.LG) 2026-06-16

FEnc$^2$: Unifying Data Packing for Efficient Private Inference via Convolution and Architecture-Aware Fragment Encoding

arXiv:2606.16359v1 Announce Type: cross Abstract: Fully Homomorphic Encryption (FHE) enables privacy-preserving machine learning but incurs extreme computational and memory overhead. These costs come not only from expensive low-level primitives, including Number Theoretic Transform (NTT), rotation, and key-switching, but also from inefficient ciphertext packing at the application level. Existing packing strategies typically preserve either neighboring data elements or feature grouping, but not both, leading to wasted ciphertext slots, excessive rotations, and inflated ciphertext counts. We propose FEnc2, a unified and principled fragment-based encoding framework for CKKS-based private convolutional neural network inference. FEnc2 optimizes slot utilization, rotation complexity, and ciphertext density through two components: 1)Conv-aware Encoding, which analytically selects an optimal fragment size to decouple spatial dependencies and jointly minimize inner-outer rotations across layers, and 2)Arch-aware Ct Compression, which restores ciphertext density after feature- or channel-reduction layers. Together, these transformations reshape encrypted workload structure and reduce homomorphic operations by one to two orders of magnitude. With full memory capacity utilized, i.e., at maximum batch size, FEnc2 achieves end-to-end latency speedups over the state-of-the-art Orion of up to 228.83x on GPU and 226.06x on CPU for LeNet on MNIST, and up to 4.55x on GPU and 9.43x on CPU for MobileNet on ImageNet. FEnc2 is hardware-agnostic yet architecturally transformative: by optimizing encrypted tensor layout before execution, it reduces ciphertext count and workload pressure on hardware, complementing primitive-level optimizations such as NTT and keyswitch accelerators. These results show that application-level data layout is a first-order architectural design dimension for encrypted inference and an important enabler for next-generation FHE systems.

09.
arXiv (CS.LG) 2026-06-11

DeepRHP: A Hybrid Variational Autoencoder for Designing Random Heteropolymers as Protein Mimics

arXiv:2606.11651v1 Announce Type: new Abstract: Synthetic random heteropolymers (RHPs), consisting of a predefined set of monomers, offer an approach toward the design of protein-like materials. These RHPs, if designed appropriately, can mimic protein behavior and function. As such, there is a need for computational tools to efficiently guide RHP design. We bridge this gap by developing DeepRHP, a modified variational autoencoder (VAE) model under a semi-supervised framework. By equipping a classical VAE with an additional feature-based VAE, DeepRHP forces the latent space to capture structures of critical chemical features as well as individual RHP sequence patterns. In this sense, our method is versatile by allowing any relevant features to be incorporated in a hybrid manner. We demonstrate the effectiveness of DeepRHP by suggesting potential monomer compositions that stabilize membrane proteins (e.g. Aquaporin Z) in non-native environments and cross-validating our prediction with published results. The concordance between our model and true RHP function suggests strong potential in utilizing hybrid autoencoder architectures to guide RHP design for proteins and other biological compounds.

10.
bioRxiv (Bioinfo) 2026-06-12

Computational Design of Optimal Sequences for Targeted Hypermutagenesis Using Recombination-Coupled Diversity-Generating Retroelements

Diversity-generating retroelements (DGRs) are natural systems that accelerate evolution via targeted hypermutation at adenines. We previously developed DGRec, a system combining DGRs and recombineering for programmable mutagenesis in Escherichia coli. We here address two important issues with DGRec: the dependence of mutagenesis efficiency on the dgrRNA secondary structure and the variability of the reverse-transcription biases with sequence context and position. First, we introduce and validate a method to recode non-functional templates, i.e. with low mutagenesis efficiency, into highly functional ones through synonymous mutations. Second, we develop a Long Short-Term Memory (LSTM) model to predict DGRec mutational profiles for any given template sequence. By integrating this LSTM model with our recoding method, we establish a comprehensive workflow for customized directed evolution, enabling researchers to precisely fine-tune DGRec in vivo mutagenesis to their engineering needs.

11.
arXiv (CS.AI) 2026-06-18

SafeClawBench: Separating Semantic, Audit-Evidence, and Sandbox Harm in Tool-Using LLM Agents

arXiv:2606.18356v1 Announce Type: cross Abstract: Tool-using language-model agents introduce security failures that go beyond unsafe text: they can disclose protected objects, write persistent memory, send messages, modify databases, or trigger harmful code and tool effects. Existing evaluations often collapse these stages into a single attack success rate, making it difficult to tell whether a model merely agreed with an attacker or actually produced observable harm. We introduce SafeClawBench, a staged benchmark for tool-using agent security with 600 controlled adversarial tasks across six attack families: direct and indirect prompt injection, tool-return injection, memory poisoning, memory extraction, and ambiguity-driven unsafe inference. SafeClawBench reports three separate endpoints: semantic attack acceptance, audit-visible harm evidence, and sandbox-observed tool/state harm. Evaluating five agent endpoints under four prompt-level policies, we find that these endpoints capture different failure modes. Without additional prompt protection, semantic failure rates vary widely across models, from 9.0% to 44.2%. Audited harm evidence is narrower than semantic failure, and under a separate executable protocol some matched task identities produce sandbox harm despite passing the Semantic Core call: in a 12,000-row matched analysis, 291 of 347 observed sandbox harms occur in rows that pass the semantic check. Prompt policies change endpoint outcomes, but their effects depend on both model and protocol. SafeClawBench provides a reproducible framework for comparing agent models and prompt-policy conditions without conflating textual compliance, evidence-supported harm, and executable state changes. The open-source dataset is available at https://huggingface.co/datasets/sairights/safeclawbench.

12.
arXiv (CS.LG) 2026-06-15

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

arXiv:2606.14560v1 Announce Type: cross Abstract: Non-Euclidean optimisation methods with matrix-valued updates, such as Muon and Scion, have recently shown strong empirical performance for training Transformer models, yet their theoretical advantages over Euclidean methods remain poorly understood. We address this gap in the heavy-tailed non-convex regime, where stochastic gradients have bounded $p$-th central moments, $p \in (1,2]$. We show that certain non-Euclidean methods achieve optimal sample complexity under stronger stationarity measures, while Euclidean methods incur additional dimension-dependent costs. As a consequence, for $m \times n$ matrices, Muon finds an $\varepsilon$-stationary point in nuclear norm within $\mathcal{O}\left(\min\{m, n\} \frac{\Delta_1 L}{\varepsilon^2} \left(\frac \sigma \varepsilon \right)^{\frac p {p-1}}\right)$ samples, absorbing heavy-tailed noise without extra dimension dependence, unlike Euclidean methods. We further prove this sample complexity, including its dimension dependence, is optimal for all first-order methods under nuclear-norm stationarity. Experiments on large language models support our theory. Surprisingly, our results suggest that other Schatten geometries beyond the spectral geometry of Muon can perform competitively in certain settings.

13.
medRxiv (Medicine) 2026-06-22

A blinded, counterbalanced rater design for evaluating AI-assisted summarisation of tertiary clinical genomics reports: methodology of the QNOMX-VHIR-CPSP-001 Phase 1 study

Background. Tertiary clinical genomics reports condense layered molecular findings into documents that treating oncologists must read, translate, and act upon; manual summarisation of these reports is time-consuming and variable. Tools that assist summarisation and translation into local languages are emerging, yet the field lacks an agreed methodology for evaluating such tools before any downstream clinical use. The appropriate first endpoint is fidelity of the generated summary to its source report, assessed by qualified human raters under blinded scoring, not downstream variant classification. Methods. QNOMX-VHIR-CPSP-001 Phase 1 is a single-site, non-interventional clinical performance study conducted at Vall d'Hebron Institut de Recerca (VHIR) under ISO 20916:2019 as a Clinical Performance Study Protocol. De-identified tertiary cancer genomics reports from pediatric oncology cases are summarised by the AI-assisted summarisation system under evaluation and, in parallel, by the standard manual workflow. Qualified raters score both summary types against the source genomics report using the Quality Summary Index (QSI), a six-dimension, five-point rubric adapted from the Provider Documentation Summarization Quality Instrument, under a blinded, counterbalanced, two-period crossover with a minimum fourteen-day washout. Two co-primary composite endpoints, content and presentation, are analysed for non-inferiority under a Bayesian hierarchical model, with a frequentist linear mixed model as the convergence check. Inter-rater reliability is reported as Krippendorff's ; a Monte-Carlo power analysis of the fixed clustered design is pre-specified. Discussion. The design isolates summarisation quality from clinical decision-making by scoring both summary types against the same source report under blinding, counterbalancing, and a fourteen-day washout. Conclusion. The QSI rubric, the counterbalanced crossover, and the pre-specified Bayesian primary with frequentist convergence check define a replicable protocol for early-stage evaluation of AI-assisted summarisation in tertiary genomics reporting; observed variance components will inform sample-size determination for Phase 2.

14.
arXiv (CS.CV) 2026-06-17

DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models

Autonomous driving has shifted towards end-to-end policy learning, where reliable, interpretable policy evaluation is a fundamental challenge as driving quality is highly context-dependent. Commonly used rule-based driving metrics like EPDMS are interpretable but lack context-awareness, while recent VLMbased evaluations are context-aware but limited by ambiguous VLM outputs and weak physical grounding. To evaluate driving in a manner that is both interpretable and context-aware, we introduce DriveJudge. DriveJudge is a driving evaluation agent that combines rule-grounded evaluation with Vision-Language Model (VLM) reasoning and selectively invokes physically-grounded deterministic rule functions after interpreting the environmental context. To train and evaluate DriveJudge, we curate a large-scale dataset of 33,577 challenging driving samples with human annotations on whether the driving behavior is reasonable in the given scenario. With this dataset, we address the underexplored problem of driving metric evaluation, and introduce two human-aligned benchmark tasks: Driving Quality Classification and Trajectory Preference Selection. DriveJudge outperforms EPDMS for driving quality classification by 21.23 AUC, and the recent VLM-based DriveCritic for trajectory preference selection by 6.5%, setting a new standard for interpretable and precise driving evaluation.

15.
medRxiv (Medicine) 2026-06-17

Menopausal symptoms in peri- and postmenopausal women: systematic review and meta-analysis of prevalence, incidence, comorbidities, and clinical outcomes

Introduction: The global epidemiology of menopausal symptoms among middle-aged and elderly women remains unclear. Methods: Data on prevalence, comorbidities, incidence and outcomes of menopausal symptoms published up until March 1st 2019 were searched in PubMed, Embase and Cochrane databases. We used a random-effects model to compute point estimates of prevalence for 24 types of menopausal symptoms. We narratively summarized the patterns of the comorbidities, incidence and outcomes of menopausal symptoms due to limited data. Results: A total of 239 studies (n{approx}2.5 million middle-aged and elderly women) from 56 countries and regions were included in the analysis. The global pooled prevalence analysis revealed that hot flashes (48%) and night sweats (30%) were highly prevalent, alongside psychological symptoms like insomnia (47%), irritability (46%), anxiety (39%), and depression (30%). Physical symptoms including joint aches/pain (50%), backache (47%), and tiredness (61%) were also commonly reported. Heat intolerance showed the highest prevalence (76%), while symptoms like urinary incontinence (24%) and poor appetite (8%) were less frequent. These findings highlight the diverse and widespread impact of menopause on women globally, with significant variations across symptom types. Africa showed the highest pooled prevalence across a series of symptoms, compared with other continents. We observed high prevalence in developing countries, especially for psychological and physical symptoms; significant intra-Asian variation in vasomotor symptoms; hypertension and obesity as the most common comorbidities; joint pain, urinary incontinence, and vasomotor symptoms as the most incident complaints; and positive associations with cardiovascular disease in the psychological (depression and insomnia) and physical (joint pain) domains. Conclusion: This study highlights the global burden of menopausal symptoms, with significant differences across continents. The findings call for more inclusive research on underrepresented groups (particularly in Africa) and further investigation into drivers of this marked global heterogeneity in prevalence of menopausal symptoms and their comorbidities, incidence and outcomes.

16.
arXiv (CS.AI) 2026-06-19

Overcoming Labelled Data Scarcity for Defect Classification in Scanning Tunneling Microscopy

arXiv:2506.01678v2 Announce Type: replace-cross Abstract: Scanning tunnelling microscopy (STM) is a powerful technique for imaging surfaces with atomic resolution, providing insight into physical and chemical processes at the level of single atoms and molecules. A regular task of STM image analysis is the identification and labelling of features of interest against a uniform background. Performing this manually is a labour-intensive task, requiring significant human effort. To reduce this burden, we propose an automated approach to the segmentation of STM images that uses both few-shot learning and unsupervised learning. Our technique offers greater flexibility compared to previous supervised methods; it removes the requirement for large manually annotated datasets and is thus easier to adapt to an unseen surface while still maintaining a high accuracy. We demonstrate the effectiveness of our approach by using it to recognise atomic features on three distinct surfaces: Si(001), Ge(001), and TiO$_2$(110), including adsorbed AsH$_3$ molecules on the silicon and germanium surfaces. Our model exhibits strong generalisation capabilities, and following initial training, can be adapted to unseen surfaces with as few as one additional labelled data point. This work is a significant step towards efficient and material-agnostic, automatic segmentation of STM images.

17.
arXiv (CS.AI) 2026-06-18

Conflict-Aware Retriever Editing for Knowledge Injection Attacks on LLM-Based RAG Systems

arXiv:2606.18310v1 Announce Type: cross Abstract: Injecting malicious knowledge into retrieval-augmented generation (RAG) systems can manipulate retrieved evidence and mislead downstream generation, posing a serious security threat for AI applications. Existing RAG injection attacks mainly rely on manipulating external knowledge bases, such as crafting malicious corpus. However, the synthetic text crafted by such data-centric methods could be detectable, leading to the failure of attacks. Beyond corpus manipulation, open-source retrievers are increasingly exposing RAG systems to model-centric attacks. In this paper, we propose conflict-aware retriever editing, i.e., CAREATTACK, a model-centric retriever attack framework for malicious knowledge injection in RAG. Specifically, CAREATTACK consists two stages of conflict-aware retriever editing and attack-preserving anchor repair. Conflict-aware retriever editing adapts efficient closed-form parameter editing to the dense retrieval model, promoting malicious knowledge above benign competing passages and resolving potential parameter conflicts through graph-based conflict detection and parameter editing projection. Then, attack-preserving anchor repair performs lightweight calibration on the edited retriever to further eliminate the impact on non-target prompts while preserving the attack effectiveness for target prompts. We instantiate CAREATTACK on Qwen3-Embedding-0.6B and BGE-M3, and conduct evaluation on three benchmark datasets. Experimental results demonstrate our method substantially promote malicious passages into the retrieved knowledge of RAG systems and can perform attacks for batches of target prompts and passages, given the access of retrieval model parameters. Since most RAG systems are built upon open-source retrieval models, this work reveals a practical attack surface in RAG systems. Codes are public accessible at https://anonymous.4open.science/r/CareAttack-3F1C.

18.
arXiv (quant-ph) 2026-06-17

Cumulant expansion approach to the decay dynamics of interacting Mössbauer nuclei after strong impulsive excitation

arXiv:2510.00970v2 Announce Type: replace Abstract: Recent progress in accelerator-based x-ray sources brings higher excitation of ensembles of Mössbauer nuclei closer to experimental feasibility. Yet, a theoretical modeling of the decay dynamics of the interacting nuclear ensemble after the impulsive excitation is still an open challenge. Here, we derive a set of nonlinear equations which is capable of efficiently modeling large nuclear ensembles for arbitrary degrees of excitation. As key signature for higher excitation, we identify a non-linear time-evolution of the nuclear dipole phase, which can be tuned via the scattering geometry, and interferometrically be measured. Furthermore, we identify interesting finite-size effects in the nuclear dynamics of small ensembles. Our results provide important guidance for future experiments aiming at the non-linear excitation of nuclei. We further envision the exploration of finite size-effects in Mössbauer spectroscopy with highest spatial resolution, i.e., small sample volumes.

19.
medRxiv (Medicine) 2026-06-22

Leishmaniasis on YouTube: a critical appraisal of the quality, reliability, and transparency of educational content

Background: Leishmaniasis is a neglected tropical disease of significant global public health importance, for which accurate information is essential to support prevention and early care-seeking, particularly in endemic, resource-limited settings. YouTube is a widely used source of health information, but the quality and reliability of leishmaniasis-related content have not been evaluated. We aimed to assess the quality, reliability, and transparency of English-language YouTube videos on leishmaniasis. Methods: We conducted a cross-sectional analysis of YouTube videos retrieved via the YouTube Data API on 15 June 2026 using the terms "leishmaniasis," "cutaneous leishmaniasis," and "visceral leishmaniasis." After applying eligibility criteria and screening the 150 most-viewed eligible videos, 48 videos were included. Two reviewers independently assessed each video using the modified DISCERN (mDISCERN) tool, the Global Quality Score (GQS), and the JAMA benchmark criteria, with disagreements resolved by consensus. Inter-rater agreement was assessed using the intraclass correlation coefficient (ICC), and associations were examined using Spearman's rank correlation. Results: Of 402 videos retrieved, 48 met the inclusion criteria. The median GQS was 3.00 (IQR 2.00-4.00) and median mDISCERN was 3.00 (IQR 2.38-4.50), indicating moderate quality and reliability, while the median JAMA score was 2.00 (IQR 1.00-2.00), reflecting limited transparency; no video met all four JAMA criteria. The overwhelming majority of videos (47/48, 97.9%) were of professional or institutional origin. Inter-rater agreement was good to excellent (ICC 0.883 for GQS, 0.896 for mDISCERN, 1.000 for JAMA). The instruments were strongly inter-correlated (mDISCERN-GQS rho = 0.841, p < 0.001). Quality scores did not correlate positively with views, likes, or video duration; comments correlated weakly and negatively with mDISCERN (rho = -0.337, p = 0.031) and JAMA (rho = -0.381, p = 0.014). Conclusions: YouTube videos on leishmaniasis are of moderate quality and reliability but limited transparency, and are produced almost exclusively by professional sources. Video popularity, length, and age were not indicators of quality. There is a need for experts and institutions to produce clearly authored, well-sourced, and transparent educational content on this neglected tropical disease.

20.
arXiv (CS.LG) 2026-06-19

Entropy Estimation in Multi-Qutrit Systems via Variational and Classical Neural Networks

arXiv:2606.20504v1 Announce Type: cross Abstract: We present a systematic study of von Neumann entropy estimation in multi-qutrit quantum systems using two complementary approaches: variational quantum algorithms (VQAs) and classical convolutional neural networks (CNNs), evaluated using an ideal (noise-free) quantum simulator. For systems up to three qutrits, we construct and evaluate 11 hardware-efficient SU(3)-inspired ansatzes. A parameter sweep shows that estimation accuracy is primarily determined by the number of trainable parameters, provided sufficient entanglement is present. Based on this study, we fix the parameter count to approximately 120 for subsequent experiments, observing that increasing entangling-gate counts beyond a threshold yields only marginal improvements. For larger systems (two to five qutrits), we use a CNN trained on measurement outcomes from tensor-product mutually unbiased bases. The model achieves accurate and stable predictions and exhibits a systematic improvement in performance with system size, with the highest errors for two-qutrit systems and the lowest for five-qutrit systems. Notably, using only 12.5% of the measurements required for full state tomography is sufficient to reach 90th-percentile absolute errors of approximately 0.13-0.16 nats for both four- and five-qutrit systems. The CNN model is also robust to shot noise and generalizes well to out-of-distribution states. Overall, within the simulated settings studied here, our results indicate a transition in practical methods: VQAs are effective for small systems, while CNN-based estimators offer improved scalability and robustness for larger qutrit systems.

21.
arXiv (CS.LG) 2026-06-16

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

arXiv:2605.18528v3 Announce Type: replace-cross Abstract: A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. The layerwise input-output structure of neural networks motivates scale-invariant optimizers, such as Muon and Scion, whose updates also support hyperparameter transfer. At the same time, stochastic gradient noise in deep learning is often far from sub-Gaussian and may exhibit heavy tails. These observations have shaped recent algorithmic principles for training neural networks, yet their joint theoretical consequences are underexplored. In particular, it remains unclear what dimension dependence is unavoidable for gradient-based methods given the problem class is defined by input-output norm and under heavy-tailed noise, and whether higher-order smoothness can accelerate training. We study these questions through nonconvex smooth stochastic optimization over $\mathbb R^{m\times n}$ equipped with general norms and under $p^\mathrm{th}$-moment heavy-tailed noise, where the goal is to achieve an $\epsilon$-stationary point in the dual norm. Our first contribution is a dimension-dependent lower bound: when $\frac{\max\{m,n\}}{(\min\{m,n\})^2}$ is large enough, any gradient-based method requires $\Omega(\min\{m, n\}\epsilon^{-\frac{3p-2}{p-1}})$ oracles for the problem class defined by the spectral norm, which is a common input-output norm. We prove that a scale-invariant Scion method with the spectral norm can achieve the matching upper bound of $O(\min\{m, n\}\epsilon^{-\frac{3p-2}{p-1}})$. To exploit higher-order smoothness, we propose a transported Scion method and improve the bound to $O(\min\{m, n\}\epsilon^{-\frac{5p-3}{2p-2}})$ when the Hessian is Lipschitz. Finally, we incorporate heuristics into our transported method and evaluate it across multiple architectures and model sizes, demonstrating its flexibility and compatibility with neural network training.

22.
arXiv (CS.LG) 2026-06-19

BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining

arXiv:2510.06048v5 Announce Type: replace Abstract: Effective data selection is essential for pretraining large language models (LLMs), enhancing efficiency and improving generalization to downstream tasks. However, existing approaches often require leveraging external pretrained models, making it difficult to disentangle the effects of data selection from those of the external pretrained models. In addition, they often overlook the long-term impact of selected data if the model is trained to convergence, primarily due to the prohibitive cost of full-scale LLM pretraining. In this paper, we introduce BLISS (BileveL Influence Scoring method for data Selection): a lightweight data selection method that operates entirely from scratch, without relying on any external pretrained oracle models, while explicitly accounting for the long-term impact of selected data. BLISS leverages a small proxy model as a surrogate for the LLM and employs a score model to estimate the long-term influence of training samples if the proxy model is trained to convergence. We formulate data selection as a bilevel optimization problem, where the upper-level objective optimizes the score model to assign importance weights to training samples, ensuring that minimizing the lower-level objective (i.e., training the proxy model over the weighted training loss until convergence) leads to best validation performance. Once optimized, the trained score model predicts influence scores for the dataset, enabling efficient selection of high-quality samples for LLM pretraining. We validate BLISS by pretraining 410M/1B/2.8B Pythia and LLaMA-0.5B models on selected subsets of the C4 dataset. Notably, under the 1B model setting, BLISS achieves $1.7\times$ speedup in reaching the same performance as the state-of-the-art method, demonstrating superior performance across multiple downstream tasks.

23.
medRxiv (Medicine) 2026-06-10

Global and local genetic overlap among ME/CFS, irritable bowel syndrome and psychiatric traits: a hypothesis-generating analysis

作者:

Background. Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and irritable bowel syndrome (IBS) frequently co-occur following infection, yet shared genetic architecture at the locus level has not been systematically characterised. Aims. To estimate global and local genetic correlations between ME/CFS (including infection-onset subgroup), IBS, major depressive disorder (MDD) and loneliness/isolation, and characterise ME/CFS cell-type heritability enrichment. Method. GWAS summary statistics: DecodeME (15,579 ME/CFS; 9,738 infection-onset), FinnGen R9 (9,296 IBS), PGC MDD Wave 2 (45,396) and UK Biobank loneliness (N=455,364). LDSC for global correlations; LAVA for local correlations across 2,495 loci; MAGMA for cell-type enrichment (Descartes Human atlas); coloc.abf for colocalisation. Results. All pairwise global correlations were significant after Bonferroni correction, including ME/CFS-all-MDD (rg=0.598, 95% CI 0.46-0.74) and ME/CFS-all-IBS (rg=0.573, 0.39-0.75). Of 4,232 local tests, 16 reached FDR

24.
arXiv (CS.CL) 2026-06-16

Koshur Diacritizer: A Byte-Level Sequence-to-Sequence Model for Kashmiri Diacritic Restoration

Kashmiri, an Indo-Aryan language written in a modified Perso-Arabic script, frequently omits diacritic marks in digital text, creating ambiguity and challenging downstream NLP applications. We present Koshur Diacritizer, a ByT5-small byte-level sequence-to-sequence model for restoring diacritics in Kashmiri text. To support this task, we release a publicly available dataset of 23.7k aligned undiacritized diacritized Kashmiri sentence pairs. The proposed framework combines script-aware normalization, alignment validation, and skeleton-preserving inference to ensure reliable restoration while maintaining the original base-letter sequence. Experimental results on a held-out test set achieve a DERm of 0.2012 and a WER of 0.2159. Additionally, evaluation by a native Kashmiri linguistic expert yields a mean accuracy of 77.5%. The dataset, model, and source code are publicly released to provide a reproducible baseline for Kashmiri diacritic restoration and future low-resource language research.

25.
bioRxiv (Bioinfo) 2026-06-11

HalluDesign-NA: Extending HalluDesign for De Novo Nucleic Acid Design

AlphaFold3 has revolutionized the prediction of biomolecular structures and interactions, including atomic-level modeling of nucleic acids. However, the de novo design of structured and functional nucleic acids remains a significant challenge. Here, we extend our HalluDesign framework to nucleic acid design by integrating NA-MPNN for nucleic acid sequence optimization and design. This new framework, HalluDesign-NA, enables iterative sequence-structure co-optimization, facilitating the de novo design of nucleic acids. Computational benchmarking across ssDNA, ssRNA, and aptamer design tasks demonstrates consistent improvements in confidence scores (pLDDT, ipTM), supporting the feasibility of de novo nucleic acid design under various constraints, such as sequence length, symmetry, and protein structure context. We anticipate that HalluDesign-NA will accelerate the de novo design of functional nucleic acids for applications in biotechnology and medicine. The source code for HalluDesign-NA is available at https://github.com/MinchaoFang/HalluDesign_NA.