← 返回大厅
arXiv (CS.LG) 2026-06-19 12:00 DOI: arXiv:2606.20411

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

摘要 / Abstract

arXiv:2606.20411v1 Announce Type: new Abstract: Direct Advantage Estimation (DAE) has been shown to improve the sample efficiency of deep reinforcement learning algorithms. However, its reliance on full environment observability limits its applicability in realistic settings, and its requirement to model transition probabilities incurs substantial computational overhead for high-dimensional observations. In the present work, we address both limitations. First, we extend the theoretical framework of DAE to partially observable domains with minimal modifications. Second, we reduce its computational complexity by introducing discrete latent dynamics models that efficiently approximate transition probabilities. We evaluate our approach on the Arcade Learning Environment and find that DAE scales effectively with function approximator capacity while retaining high sample efficiency.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。