论文广场 - AcademicHub

01.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2601.17421

Oops, Wait: Discourse Tokens Matter in Reasoning Model

作者:

Jaehui Hwang ↗Byeongho Heo ↗Sangdoo Yun ↗Dongyoon Han ↗

Recent studies suggest that even data-efficient training with ($\simeq$1K) reasoning trajectories can induce non-trivial reasoning capabilities in large language models through post-training. Such training corpora often contain iconic tokens such as "wait", "so", and "alternatively", which frequently appear in reasoning trajectories and may play a role in this process. This paper focuses on characterizing observable token-level patterns in post-training and a case study of how data-efficient supervised fine-tuning (SFT) differs from, and falls short of, large-scale post-training. To this end, we first identify tokens that correlate with correct answers along reasoning trajectories across models and training setups. We then focus on the distribution and (functional) roles of the "wait" token to primarily study the model trained in a data-efficient manner compared with the counterpart. Our study finds that discourse tokens are associated with correctness and a reasoning accuracy jump, even in data-efficient SFT. This suggests data-efficient SFT can partially reproduce discourse-token patterns to mimic meaningful reasoning behavior, but the patterns are less aligned with high-confidence answer transitions than those from large-scale post-training.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络

Oops, Wait: Discourse Tokens Matter in Reasoning Model