← 返回大厅
arXiv (CS.CL) 2026-06-24 12:00 DOI: arXiv:2606.23948

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

摘要 / Abstract

Self-supervised and supervised speech models are increasingly used to investigate which linguistic information their internal representations encode, and at what level of abstraction they encode it. One underexplored phenomenon is consonant cluster reduction (CCR) in African American English (AAE), a widespread phonological process and a source of automatic speech recognition (ASR) disparity. To examine how CCR is represented, we conduct speaker-independent layer-wise probing of wav2vec2-base and Whisper-small using two tasks: segmental reduction detection and segmental restoration of underlying cluster identity. Both models distinguish reduced and canonical forms with high accuracy. Crucially, reduced segments retain cues to their underlying stops, indicating that CCR is encoded as structured gradient phonological variation rather than simple segmental deletion. These results demonstrate structured phonological encoding of AAE CCR patterns in modern speech models.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。