Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-17

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

Reinforcement learning (RL) has emerged as a powerful paradigm for training Large Language Models (LLMs) as agents. However, conventional RL methods for long-horizon agentic tasks often struggle with sparse outcome rewards. Intuitively, this overlooks the rich environment dynamics information contained in rollout interaction trajectories. We argue that the interaction experience inherently serves as an implicit supervision signal, reveals the underlying transition mechanisms of the environment, and enables the agent to construct a more accurate internal model of the environment.. Therefore, in this work, we investigate how to leverage this additional signal to improve policy learning. Specifically, we propose EnvRL, a framework that incorporates environment dynamics learning into agentic RL via two auxiliary objectives: state prediction and inverse dynamics. By jointly optimizing with the primary RL objective, we encourage the agent to internalize environment dynamics from its own interaction experience. Extensive experiments on two long-horizon agentic benchmarks demonstrate that EnvRL achieves significant improvements on success-rates over RL-only baselines, e.g., when trained with GRPO, lifting Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld, and from 56.8% to 67.0% on WebShop.

02.
arXiv (CS.CV) 2026-06-17

MoonSplat: Monocular Online Gaussian Splatting with Sim(3) Global Optimization

Online 3D reconstruction from monocular image sequences is a challenging and ongoing research topic. 3D Gaussian Splatting (3DGS), leveraging its high-quality real-time rendering capability, empowers online 3D reconstruction to represent dense scenes with enhanced expressiveness, and thus holds great promise for a wide range of applications such as robotics and AR/VR. However, existing online 3DGS methods still suffer from some key challenges: fragile camera pose estimation due to the lack of global optimization, and low optimization efficiency in large-scale or long-sequence scenarios. To address these issues, we propose a robust and efficient online voxelized 3DGS reconstruction framework integrated with global $Sim(3)$ optimization, which enables reliable camera tracking and efficient global loop closure for both camera poses and voxelized 3DGS. To accelerate the convergence of the voxelized 3DGS, we further introduce a color residual learning strategy, which not only boosts optimization speed but also enhances rendering quality. Extensive experiments on diverse indoor and outdoor datasets demonstrate that our method achieves state-of-the-art performance in both camera pose estimation accuracy and rendering quality, while retaining real-time efficiency. Additionally, we develop and deploy a real-world UAV-based active reconstruction system grounded on our proposed method, validating its robustness and generalizability for practical online 3D reconstruction tasks. Our code and data are available at https://github.com/TrickyGo/MoonSplat.

03.
arXiv (CS.CV) 2026-06-17

Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Vision backbone networks play a central role in modern computer vision. Enhancing their efficiency directly benefits a wide range of downstream applications. To measure efficiency, many publications rely on MACs (Multiply Accumulate operations) as a predictor of execution time. In this paper, we experimentally demonstrate the shortcomings of such a metric, especially in the context of edge devices. By contrasting the MAC count and execution time of common architectural design elements, we identify key factors for efficient execution and provide insights to optimize backbone design. Based on these insights, we present LowFormer, a novel vision backbone family. LowFormer features a streamlined macro and micro design that includes Lowtention, a lightweight alternative to Multi-Head Self-Attention. Lowtention not only proves more efficient, but also enables superior results on ImageNet. Additionally, we present an edge GPU version of LowFormer, that can further improve upon its baseline's speed on edge GPU and desktop GPU. We demonstrate LowFormer's wide applicability by evaluating it on smaller image classification datasets, as well as adapting it to several downstream tasks, such as object detection, semantic segmentation, image retrieval, and visual object tracking. LowFormer models consistently achieve remarkable speed-ups across various hardware platforms compared to recent state-of-the-art backbones. Code and models are available at https://github.com/altair199797/LowFormer/blob/main/Beyond_MACs.md.

04.
arXiv (CS.CV) 2026-06-16

Scribby: A Multi-Level LLM Framework for Semantic Video Analysis

As video content continues to expand across educational platforms, recorded lectures, and live-streamed entertainment, the need for efficient and structured analysis of long-form footage has increased [1]. Although many existing AI programs provide high-level video summaries based on AI-generated transcripts [2,3,4,5], these approaches are often limited to coarse overviews and lack detailed analysis of a video's structure, thematic progression, and semantic relationships, all of which are required for comprehensive video analysis. This paper proposes an LLM-based video summarization framework that balances macro-level comprehension with micro-level semantic analysis [6,12,13]. The first stage of the process indexes the video at a micro level by (1) analyzing the full transcript, (2) analyzing individual transcript sentences, and (3) grouping these sentences by semantic similarity using an LLM as a judge [6,13]. Contextual continuity is retained during sentence-level processing by incorporating both the global transcript analysis and adjacent sentence information into each evaluation prompt. This framework establishes a foundation for video analysis tools that visualize semantic chunking and semantic matching through relevance-based heatmaps. Limitations and future expansions of the framework are also discussed.

05.
arXiv (CS.LG) 2026-06-19

Folded Transport MCMC: Eliminating Label Switching by Sampling on a Fundamental Domain

作者:

arXiv:2606.04307v2 Announce Type: replace Abstract: In Bayesian mixture models and other exchangeable-component models, the posterior is invariant under permutation of component labels, creating m! equivalent modes-the label-switching problem. Standard MCMC methods either mix poorly across these modes or rely on post-hoc relabelling that cannot guarantee the sampler has converged. We propose Folded Transport MCMC (FolT-MCMC), which eliminates label switching before sampling by restricting the Markov chain to a fundamental domain-a sorted or reflected subspace containing exactly one representative from each symmetric mode. The proposal is a learned normalising flow whose density is symmetrised over the group orbits, ensuring correct targeting on the reduced space. We show that this construction preserves a computable convergence diagnostic based on the oscillation of the log-density ratio, and that the diagnostic becomes sharper on the fundamental domain whenever the original-space flow under-covers one or more symmetric modes. Experiments on Gaussian mixtures (d=2-20), label-switching targets (up to 24 equivalent modes), a standard Bayesian three-component mixture posterior, and real accelerometer data from a supertall building show improvement ratios of 2x to 145x, with the folded diagnostic stable across dimensions while the unfolded diagnostic collapses.

06.
arXiv (CS.CV) 2026-06-16

Timestep Rescheduling in Diffusion Inversion

Diffusion inversion, which maps images back to the Gaussian latent space of a diffusion model, is a critical task for image reconstruction and editing. While DDIM enables fast deterministic inversion, it inherently introduces deviations that accumulate into noticeable inversion errors. Existing methods often address this by solving a fixed-point problem but largely overlook how the selection of the diffusion timestep in the noise scheduler influences inversion fidelity. In this work, we reveal that the deviation scale in diffusion inversion is strongly dependent on the timestep size, and exhibits a parabolic trend, with larger errors concentrated at both small and large timesteps. Based on this finding, we propose a simple yet effective nonuniform timestep scheduler that integrates a global rescaling with a local dynamic programming based rescheduling, enabling a strategic allocation of computational effort that minimizes the overall inversion error and preserves higher inversion accuracy. Our method serves as an off-the-shelf enhancement for existing inversion techniques and requires no extra parameters or computational overhead. Through extensive experiments, we verify that integrating our scheduler consistently boosts the performance of existing inversion methods, achieving superior results in image reconstruction and editing.

07.
arXiv (CS.AI) 2026-06-12

The Challenges of Balancing AI Compliance and Technological Innovations in Critical Sectors: A Systematic Literature Review

arXiv:2606.12423v1 Announce Type: cross Abstract: The rapid integration of artificial intelligence (AI) into critical infrastructure including healthcare, finance, energy, and defense, offers transformative benefits but also conflicts with evolving regulatory and governance frameworks. This paper presents a systematic literature review (SLR) to examine the challenges of balancing AI compliance and technological innovation across critical infrastructure sectors. The review follows established SLR guidelines to extract and synthesize insights from peer-reviewed articles, report, and institutional sources published between 2020-2025. The study identifies three interrelated challenges: fragmented regulations, excessive compliance burdens for smaller to medium enterprises (SMEs), and misaligned governance models. To address these challenges, the study highlights practical governance strategies, including risk-tiered regulation, compliance by design, and explainable AI, to support scalable and trustworthy AI deployment in critical sectors. Key contributions include a concise mapping of core AI-governance challenges and a conceptual diagram illustrating their overlap, as well as actionable strategies for policymakers and practitioner to harmonize oversight with innovation.

09.
medRxiv (Medicine) 2026-06-22

Demographic Calibration Gaps in Breast Cancer Risk Prediction: Introducing the Demographic Calibration Gap Score

作者:

ABSTRACT: Most breast cancer prediction studies skip calibration reporting entirely. Fewer still examine calibration by demographic subgroup. Predicted probabilities that are systematically off for specific racial or gender groups produce biased clinical decisions, and aggregate statistics will not catch that. Objective: To introduce the Demographic Calibration Gap Score (DCGS), a metric that measures how much calibration error varies across demographic subgroups, and to show how it performs across five classifiers, four calibration conditions, and two datasets. Methods: Five classifiers were trained on the Wisconsin Diagnostic Breast Cancer dataset (n=569) and evaluated on a breast cancer cohort from MIMIC-IV (n=1,316). Three global calibration methods were applied: no calibration, Platt scaling, and isotonic regression. A fourth condition, subgroup-targeted Platt scaling, was applied to the MIMIC cohort. DCGS was computed as across racial and gender subgroups, with 95% bootstrap confidence intervals. Conformal prediction coverage and Demographic Coverage Gap (DCG) were reported. Results: On Wisconsin, all five models achieved AUROC above 0.98 and ECE below 0.12. Performance fell sharply on the MIMIC external cohort: AUROC dropped to 0.45-0.57 for base and globally calibrated variants, confirming distributional shift. DCGS exceeded the 0.05 clinical significance threshold in 28 of 40 model-calibration combinations on the race axis. Neither global Platt nor isotonic calibration reliably reduced DCGS below that threshold. Conformal coverage collapsed to roughly 25% on MIMIC, and racial DCG exceeded 0.15 for all 20 model-variant combinations. Conclusions: Reducing population-level ECE through global recalibration does not reliably close demographic calibration gaps. DCGS gives researchers a direct, standardized way to detect and report those disparities. Code and the DCGS computation library are released as open-source Python under the MIT License.

10.
arXiv (quant-ph) 2026-06-12

Toward quantum-noise-limited interferometric measurements of optical nonlinearity in vacuum

arXiv:2602.10896v2 Announce Type: replace-cross Abstract: Quantum Electrodynamics predicts that the vacuum must behave as a nonlinear optical medium: the vacuum optical index should increase when it is stressed by intense electromagnetic fields. The DeLLight (Deflection of Light by Light) project aims to measure it by using intense and ultra-short laser pulses. The experiment uses a Sagnac interferometer to amplify the tiny deflection signal of a low-intensity probe pulse crossing the vacuum refractive-index gradient produced by an external high-intensity pump pulse. The measurement of the amplified signal by a CCD camera requires a high spatial resolution, which is limited by the ultimate quantum noise of the CCD. However, interferometric phase noise induced by the mechanical vibrations of the interferometer is also amplified and degrades spatial resolution. To overcome this, we propose a new method named High-Frequency Phase Noise Suppression (HFPNS), based on the addition of a delayed replica (5 ns) of the probe pulse. The delayed pulse, which is not affected by the pump but is subject to the same vibration noise, enables offline subtraction of correlated phase noise. In this work, we present an experimental proof-of-concept on a prototype interferometer operating with a limited amplification factor ($\mathcal{A}\simeq25$), about 10 times smaller than the required value of the final experiment. We have succeeded in reducing phase noise by a factor of 40, resulting in a residual noise level 2.3 times higher than the expected quantum noise. The residual noise is linked to delay-line instabilities and incident beam pointing fluctuations present during these tests. This result validates HFPNS as a robust method for future quantum-noise-limited interferometric measurements of vacuum optical nonlinearity, though additional stabilization and higher interferometric amplification are still needed.

11.
arXiv (CS.CL) 2026-06-18

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

When an AI assistant remembers that Sarah is a single mother working two jobs, does it interpret her stress differently than if she were a wealthy executive? As personalized AI systems increasingly incorporate long-term user memory, understanding how this memory shapes emotional reasoning is critical. We investigate how user memory affects emotional intelligence in large language models (LLMs) by evaluating 15 models on human-validated emotional intelligence tests. We find that identical scenarios paired with different user profiles produce systematically divergent emotional interpretations. Across validated user-independent emotional scenarios and diverse user profiles, systematic biases emerged in several high-performing LLMs where advantaged profiles received more accurate emotional interpretations. Moreover, LLMs demonstrate significant disparities across demographic factors in emotion reasoning and supportive recommendations tasks, indicating that personalization mechanisms can embed social hierarchies into models' emotional reasoning. These results highlight a key challenge for memory-enhanced AI: systems designed for personalization may reinforce social inequalities. To mitigate these disparities, we curate a general-purpose preference dataset designed to reduce demographic profiles' influence on emotional understanding.

12.
arXiv (CS.AI) 2026-06-16

Resilient Consensus in Agentic AI

arXiv:2606.15024v1 Announce Type: cross Abstract: Large language model (LLM) agents are increasingly deployed in multi-agent systems where they must coordinate and agree on shared decisions. We ask whether classical resilient consensus theory, developed for deterministic agents, transfers to LLM agents that may behave adversarially. Framing LLM agreement as a Byzantine consensus game, we run controlled experiments on complete and general communication graphs. We find that prompted LLM agents fail to reach agreement that is achievable in principle: consensus can fail even in settings where classical theory guarantees that a convergent algorithm exists, and this failure persists across temperatures and horizons. At the same time, wrapping the agents with classical resilient consensus filters improves agreement. The benefit of filtering depends on how much robustness the underlying topology already provides. Our results suggest that classical resilient consensus theory is a useful lens for the safety of agentic AI.

13.
bioRxiv (Bioinfo) 2026-06-11

GeroQubit: a lightweight, honesty-first de-novo design platform for geroscience-native small molecules with calibrated uncertainty

作者:

Computational molecule generation has outpaced its own credibility. We present GeroQubit, a GPU-free de-novo design platform that organizes candidates along a target x tissue x hallmark model and reports every signal alongside its measured baseline. We treat our tissue aging-signature readout as a mechanistic structural prior that we explicitly disclose is not validated against lifespan, and we surface efficacy only through a structure-to-lifespan k-NN whose weak but real signal (leave-one-out rho ~ 0.145) is wrapped in empirically-calibrated conformal intervals (90% target, 90.3% measured coverage). On a held-out retrospective recovery of ~1,940 ChEMBL binders against decoys, the score reaches ROC-AUC 0.945 with ~20x enrichment at 1% (BEDROC 0.91) and survives a scaffold-disjoint split - yet we report that it collapses to near-random (AUC 0.62) on genuinely novel chemotypes. Molecules are assembled reaction-first, so every candidate carries a verified synthetic route and atom-level synthon provenance; ADMET is handled as a multi-objective Pareto problem. We frame the disclosed weak signals and the hard-case failures not as flaws but as the honest, decision-useful output the field's own critics demand.

14.
arXiv (CS.LG) 2026-06-18

Knockoffs-based False Discovery Rate Control and Simplification for Deep Neural Networks

arXiv:2606.04404v2 Announce Type: replace-cross Abstract: The deep neural network is a widely used framework in machine learning that has been widely applied in various fields. However, deep neural networks often involve a large number of parameters and inputs, many of which may be irrelevant to the goal or true output. These parameters and input variables not only increase computational complexity, but also contribute to additional computational cost. One solution to this problem is knockoff methods, which have proven successful in controlling false discovery rates in high-dimensional regression. Building on the knockoff methods and using the regularised neural network, this paper proposes three variable screening methods under the condition of controlling false discovery rates: one layer filter, multiple layers filter, and variable weight aggregation filter. In comparison with existing algorithms, we find that our algorithms show satisfactory performance.

15.
arXiv (CS.AI) 2026-06-19

VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data

arXiv:2605.29483v2 Announce Type: replace Abstract: Wearable devices enable continuous monitoring of physiological signals such as ECG and PPG, but existing mHealth systems are largely limited to task-specific prediction pipelines or reactive question answering over static summaries. They lack the ability to support temporal reasoning, persistent physiological context, and proactive monitoring over long-term signal streams. We propose VitalAgent, a tool-augmented agentic framework for ECG/PPG-based mHealth that supports both reactive question answering and proactive monitoring. VitalAgent is built on a longitudinal physiological memory and a tool-augmented reasoning interface that enables dynamic computation over raw signals. We further introduce VitalBench, a longitudinal physiological monitoring benchmark dataset comprising 1,862 QA pairs for reactive question answering and 90.2 hours of continuous ECG/PPG recordings for proactive monitoring, covering cardiac, physical activity, and stress-related tasks. Experiments demonstrate that VitalAgent achieves over 25% improvement over prompt-based and ReAct baselines in reactive evaluation and supports proactive alert monitoring over long-term physiological signals, highlighting the importance of dynamic tool use and long-term physiological monitoring.

16.
medRxiv (Medicine) 2026-06-19

"Us with them": Co-designing a caesarean section consent and debriefing intervention in West Cameroon

Background Women-centred maternity care is a rights issue that determines the use of services. Such care ensures responsiveness to womens needs which is enacted through shared decision-making, review and response. In the West Region of Cameroon, informed consent (IC) and Debriefing for caesarean section (c-section) have been shown to be suboptimal or absent. This paper describes the participatory design of a quality-improvement hospital-based intervention. Methods From February to May 2025, we conducted a co-design process with three groups of stakeholders: 59 post c-section women and community representatives, 78 frontline c-section providers, and 29 directors of public and private hospitals. We followed four phases: planning, conducting, evaluating, and reporting. The conduct phase comprised five all-day workshops with post c-section women and community representatives, followed by five all-day workshops with the c-section providers. Finally, we held an 11th workshop with the hospital directors to scrutinize suggested interventions, evaluate their feasibility, and establish a consensus on their components. We described the intervention using the TIDieR (Template for Intervention Description and Replication) checklist. We documented the co-design process, using open-ended narratives to delineate interventions, and carried out real-time synthesis on visual aids (whiteboards and flipcharts). Intervention feasibility was quantified using a structured ad hoc matrix, while insights on facilitators and barriers were captured through qualitative free-text entries. We coupled data collection with constant comparison and triangulation through contemporaneous field notes, photographic documentation, and thematic mapping of stakeholders perceptions and interactive dynamics. Results Participants perspectives on the co-design were positive, and their motivation were very high although less than 50% reported previous involvement in co-design processes. More than 80% of participants found rated the co-design process as either good or very good. The final intervention comprised four components: (i) an in-service training; (ii) a standard operating procedure including a harmonised consent form and debriefing checklist; (ii) systematic supportive supervision, monitoring & evaluation; and (iv) a routine clinical audit. Each group of stakeholders upheld specific dimensions of the consent and debrief intervention. Post c-section women and community members emphasized emotional support, written discharge advice after debriefing, and zero tolerance of suboptimal consent and debriefing practices. Frontline c-section providers insisted on robust documentation for medico-legal protection. Hospitals Directors emphasized capacity-building and cultural friendliness. All the groups supported womans autonomous decision making. The intervention feasibility was rated high or very high by hospital directors except for the financial, infrastructural and technical domains. Conclusion This co-design process yielded a context-specific, multi-component intervention that was well accepted and deemed feasible across stakeholders. It provides a methodological approach to strengthening informed consent and debriefing as core elements of women-centred, accountable maternity care, and warrants implementation.

17.
medRxiv (Medicine) 2026-06-16

Infections and suicide and self-harm: a population-based matched cohort study

Background Infections have been associated with adverse mental health outcomes, including suicide, but evidence beyond severe or central nervous system infections is limited. We investigated associations between a range of acute infections and subsequent suicide/self-harm outcomes. Methods We conducted six infection-specific matched cohort studies using English primary care records from the Clinical Practice Research Datalink Aurum (2007-2024), linked to hospital admissions and mortality data. Adults ([≥]18 years) with a primary care record of infection (gastroenteritis, lower respiratory tract [LRTI], skin/soft-tissue [SSTI], urinary tract [UTI], sepsis, meningitis/encephalitis [positive control]) were matched (age, sex, practice, calendar period) to up to five comparators without infection. We estimated hazard ratios (HRs) for suicide/self-harm outcomes using Cox regression, stratified by matched set and implicitly adjusting for matching factors, with additional adjustment for deprivation, lifestyle factors, and comorbidities. We examined whether associations varied over time, by infection severity, antimicrobial treatment, sex, and prior mental health conditions. Findings Cohorts ranged from 18,192 individuals with meningitis/encephalitis (matched to 90,915 without) to 398,099 with SSTI (matched to 1,743,747). After adjustment, individuals with infection had a higher hazard of suicide/self-harm outcomes than comparators across all cohorts: sepsis (HR 1.79, 95% CI 1.65-1.93), gastroenteritis (1.62, 1.55-1.70), meningitis/encephalitis (1.56, 1.32-1.84), UTI (1.41, 1.33-1.50), SSTI (1.37, 1.31-1.43), and LRTI (1.37, 1.31-1.44). Risk was highest in the year post-infection, attenuating over time, and was higher among severe infections and those without prior mental health conditions. Interpretation Common acute infections recorded in primary care are associated with increased risk of suicide and self-harm, particularly following severe infections and in the year post-infection. Findings support suicide risk monitoring following acute infection, particularly among individuals without prior mental health conditions, and highlight infection prevention as a potentially modifiable strategy in vulnerable populations. Funding Wellcome and La Caixa. Copyright This work is licensed under a Creative Commons Attribution (CC BY) licence.

18.
arXiv (CS.CL) 2026-06-12

SkillChain: Closing the Loop on Skill Evolution for Image-Based E-Commerce AI Assistants

Image-based AI assistants are now deployed at production scale on e-commerce platforms, where a single uploaded image can trigger fundamentally different user intents: product search, style recommendation, visual encyclopedia, or utility tool calls, each demanding its own response format, tool invocation, and domain knowledge. Without per-intent behavioral constraints, LLM-based systems conflate these heterogeneous modes and fall short of domain quality standards, while the breadth and dynamism of the intent space render manual engineering infeasible. To address this, we present SkillChain, which closes the production feedback loop on Skill evolution, automating the lifecycle of Skills through three stages: Skill Creator for bootstrapping from task specs and trajectories, Route Optimizer for routing alignment, and Body Refiner for iterative Skill Body refinement via dual-path LLM-Judge evaluation. Deployed on a production-scale e-commerce image assistant, SkillChain substantially improves aggregate response quality, with the strongest gains on structural compliance and content quality; a one-week online A/B experiment further confirms significant gains in user engagement, content consumption, and long-term retention.

19.
medRxiv (Medicine) 2026-06-22

Mapping abstraction and metacognition onto distinct transdiagnostic symptom profiles

Transdiagnostic psychiatric research on reward-guided learning has largely focused on simple associative processes, leaving it unclear whether or how higher-level processes are disrupted. Here, we studied how abstraction, the ability to extract relevant features from complex information, and metacognition, the ability to monitor and evaluate one's own mental processes, map onto specific transdiagnostic dimensions. Using an online sample (N = 249), we examined associations between these processes and three cross-culturally robust transdiagnostic dimensions derived from a large existing dataset (N = 19,505): Compulsive hypersensitivity, Social withdrawal, and Addictive behaviours. Computational modelling of an abstract representation learning task with confidence judgments revealed that Compulsive hypersensitivity was negatively associated with both abstraction ability (pboot = 0.003) and metacognitive sensitivity (pboot = 0.005), while Social withdrawal was positively associated with metacognitive sensitivity alone (pboot = 0.002). Moreover, transdiagnostic dimensions revealed more coherent associations with higher-order cognition than symptom-level analyses, highlighting the added value of examining psychopathology at the factor rather than the symptom level. These findings portray a hierarchical view of cognitive dysfunctions in psychopathology and point to representational and metacognitive processes as potential targets for transdiagnostic intervention.

20.
arXiv (CS.LG) 2026-06-16

Adaptive Kernel Density Estimation with Pre-training

arXiv:2605.13092v2 Announce Type: replace-cross Abstract: Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a key idea behind many cutting-edge AI technologies, to the context of non-parametric density estimation. By establishing a pre-trained neural network that can recommend an appropriate location-adaptive kernel for each sample point, efficient density estimation with adaptive kernels is achieved in high dimensions. A wide range of numerical experiments show that this strategy is highly effective for improving density-estimation accuracy, when the target distribution is close to the distribution family for pre-training. When the target distribution is substantially different from the pre-training distribution family, the benefit from the proposed pre-training strategy may be diluted, but can be reactivated by an additional fine-tuning procedure.

21.
arXiv (CS.CL) 2026-06-16

Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

22.
arXiv (CS.CV) 2026-06-19

Fast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigation

Human visual attention relies on structured scanpaths to efficiently process scenes, yet instilling this behavior into robot autonomy is in its infancy and hindered by the high,computational costs of existing predictive models. To address this, we introduce GazeLNN, a computationally lightweight,scanpath prediction model that leverages Liquid Neural Networks as its recurrent engine and employs MobileNetV3 for feature extraction. Operating auto-regressively, the architecture predicts sequential fixation heatmaps conditioned on the current visual stimulus and fixation history. Despite requiring only 0.61 GFLOPs, GazeLNN achieves state-of-the-art performance on the MIT Low Resolution dataset achieving 0.47 ScanMatch score. It outperforms existing recurrent baselines across diverse evaluation metrics, while reducing computational costs by 99.40% and accelerating inference by up to six times. To investigate the role of human attention modeling in robot autonomy and demonstrate the practical utility of this highly efficient architecture, we integrate GazeLNN into an active camera-robot control policy trained via Reinforcement Learning. This integration enables human-fixation-guided perception during autonomous navigation, validated through successful real-world deployments on an aerial robot.

23.
bioRxiv (Bioinfo) 2026-06-18

segSHAPE: RNA secondary structure prediction from nanopore direct RNA sequencing

RNAs adopt complex structures that regulate key biological processes, making accurate structure prediction essential. Chemical probing coupled with Nanopore direct RNA sequencing (DRS) offers a route to single-molecule structural inference, but current tools are limited by inaccurate signal-to-sequence alignment, which degrades modification-rate estimation and downstream structure prediction. Here we introduce segSHAPE for RNA secondary structure prediction from Nanopore DRS data (both RNA002 and RNA004 chemistries), a probe-agnostic framework that improves signal alignment using prior information of basecalling and per-read signal baseline shift correction, learns position-specific k-mer raw signal parameters, and estimates per-nucleotide modification rates with an unsupervised anomaly detector. On three public RNA002 DRS datasets spanning different chemical probes (AcIm, NAI-N3) and RNAs from 421 to 1552 nt, segSHAPE achieves the highest F1 score and Matthews correlation coefficient (MCC) on all RNAs, exceeding the strongest baseline by 3.4 to 5.8 percentage points in MCC. It additionally captures the ligand-induced conformational change of the thiamine pyrophosphate (TPP) riboswitch RNA directly from RNA002 DRS data using the DEPC probe. On a public RNA004 DRS dataset, segSHAPE improves over the sm-PORE-cupine baseline by 17 ROC-AUC points in modification rate estimation and by 6.7 MCC points in structure prediction. These results establish segSHAPE as a unified, probe-agnostic pipeline for RNA structure prediction from Nanopore DRS data.

24.
arXiv (CS.LG) 2026-06-17

Manifold GCN: Diffusion-based Convolutional Neural Network for Manifold-valued Graphs

arXiv:2401.14381v3 Announce Type: replace Abstract: We propose two graph neural network layers for graphs with features in a Riemannian manifold. First, based on a manifold-valued graph diffusion equation, we construct a diffusion layer that can be applied to an arbitrary number of nodes and graph connectivity patterns. Second, we model a tangent multilayer perceptron by transferring ideas from the vector neuron framework to our general setting. Both layers are equivariant under node permutations and the feature manifold's isometries. These properties have led to a beneficial inductive bias in many deep-learning tasks. Furthermore, they enable novel, more flexible feature designs. Numerical examples on synthetic data and an Alzheimer's classification application on triangle meshes of the right hippocampus demonstrate the usefulness of our new layers: While they apply to a much broader class of problems, they outperform task-specific state-of-the-art networks.

25.
Nature Medicine 2026-06-15

Activity-dependent adaptive deep brain stimulation improves gait in Parkinson’s disease

Parkinson’s disease leads to a spectrum of locomotor deficits that vary in severity with the nature of daily activities and the fluctuating physiology of patients. Many of these deficits remain inadequately addressed by existing deep brain stimulation therapies that rely on activity-agnostic parameters optimized for cardinal motor symptoms. By contrast, therapies embedding activity-specific parameters have the potential to better address the entire range of symptoms. Here we expose physiological principles that enable real-time decoding of ongoing locomotor activities across motor fluctuations from the neural dynamics of the subthalamic nucleus. This decoding steered activity-dependent adaptations of deep brain stimulation therapies that improved locomotor deficits while preserving efficacy for cardinal motor symptoms across activities of daily living. Our activity-dependent framework provides a blueprint for next-generation neuromodulation therapies that continuously select parameters optimized to the behavioral context and fluctuating physiology of each patient. ClinicalTrials.gov registration NCT06791902 . Neural decoding algorithms that leverage physiological principles of locomotor encoding support activity-dependent deep brain stimulation therapies that improve locomotor deficits in people with Parkinson’s disease.