← Back to Lobby
arXiv (CS.LG) 2026-06-19 12:00 DOI: arXiv:2606.20107

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

Abstract

arXiv:2606.20107v1 Announce Type: new Abstract: Optimal Reinforcement Learning (RL) algorithms typically rely on carefully constructed count-based uncertainty estimates to drive exploration. Although theoretically sound, such estimates are hard to compute in practical settings and therefore offer limited insight for designing exploration heuristics. Meanwhile, ensembling has emerged as a practical approach, but remains without theoretical justification. Building on a recent ensemble-based method for Multi-Armed Bandits, we propose a quantile-based ensemble method for finite-horizon Markov Decision Processes (MDPs). Our simple count-free approach achieves optimal variance-dependent regret bounds, providing theoretical grounding for ensemble-based exploration in RL.

Peer Discussions

Sign in with a scholar account to comment or like.

Sign in now

No discussions yet.