← 返回大厅
arXiv (CS.LG) 2026-06-19 12:00 DOI: arXiv:2606.20107

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

摘要 / Abstract

arXiv:2606.20107v1 Announce Type: new Abstract: Optimal Reinforcement Learning (RL) algorithms typically rely on carefully constructed count-based uncertainty estimates to drive exploration. Although theoretically sound, such estimates are hard to compute in practical settings and therefore offer limited insight for designing exploration heuristics. Meanwhile, ensembling has emerged as a practical approach, but remains without theoretical justification. Building on a recent ensemble-based method for Multi-Armed Bandits, we propose a quantile-based ensemble method for finite-horizon Markov Decision Processes (MDPs). Our simple count-free approach achieves optimal variance-dependent regret bounds, providing theoretical grounding for ensemble-based exploration in RL.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。