← 返回大厅
arXiv (CS.LG) 2026-06-25 12:00 DOI: arXiv:2601.23075

RN-D: Discretized Categorical Actors for On-Policy Reinforcement Learning

摘要 / Abstract

arXiv:2601.23075v2 Announce Type: replace Abstract: On-policy Reinforcement Learning (RL) remains a dominant paradigm for continuous control, yet standard implementations rely on Gaussian actors and relatively shallow MLP policies, often leading to brittle optimization when gradients are noisy, and policy updates must be conservative. In this paper, we revisit actor policy representation as a first-class design choice for on-policy RL. We study discretized categorical actors, which represent each action dimension as a distribution over discrete bins and induce a policy objective analogous to classification cross-entropy loss. Building on architectural advances from supervised learning, we further pair discretized categorical actors with regularized networks, yielding RN-D. Across diverse continuous-control benchmarks, we show that simply replacing the standard Gaussian actor with our proposed actor substantially improves performance, achieving state-of-the-art results within on-policy RL. We release our code at https://github.com/alwaysbyx/RND-RL.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。