← 返回大厅
arXiv (CS.AI) 2026-06-16 12:00 DOI: arXiv:2606.16995

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

摘要 / Abstract

arXiv:2606.16995v1 Announce Type: new Abstract: Reinforcement Learning (RL) policies often degrade in unfamiliar environments because they lack explicit deliberation. We propose Plan, Align, Commit, Think (PACT), a hybrid architecture that combines a fast, reactive RL policy with a slow, deliberative Small Language Model (SLM) planner. PACT invokes the SLM asynchronously to generate and validate candidate action plans. Once a plan is verified through simulation as safe, feasible, and complete, it is executed directly, bypassing the RL policy without retraining or modifying it. Evaluated on three FrozenLake configurations of increasing difficulty, PACT outperforms all baselines while relying on a 2B-parameter SLM backbone, suggesting that deliberative planning and reactive execution are more powerful in concert than either is alone in these settings.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。