← 返回大厅
arXiv (CS.CL) 2026-06-16 12:00 DOI: arXiv:2602.05060

StagePilot: Stage-Level Planning for Long-Horizon Dialogue Simulation in Cybergrooming

摘要 / Abstract

Cybergrooming is an evolving threat to youth, requiring proactive educational interventions. We address this by modeling dialogue progression as a structured planning problem over stage-wise interactions. We propose StagePilot, a dialogue framework that separates stage-level planning from response generation, in which the model selects the next stage under constrained transitions and generates responses conditioned on it, enabling coherent and realistic progression. Reinforcement learning is used to learn stage-level policies from offline data, optimizing for both emotional alignment and goal-consistent progression. Our empirical experiments show that StagePilot generates more structured, coherent dialogue trajectories and reduces conversational stagnation compared to baselines; notably, the IQL+AWAC variant reaches the final stage more often while maintaining over 70% positive or neutral responses, yielding a 43% relative improvement.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。