← 返回大厅
arXiv (CS.CL) 2026-06-24 12:00 DOI: arXiv:2606.24387

AutoSpecNER: A Fine-Grained Named Entity Recognition Dataset for Vehicle Specification Extraction

摘要 / Abstract

Vehicle advertisements contain rich specification information, but automotive NER resources remain limited. We introduce AutoSpecNER, an expert-annotated dataset for fine-grained entity recognition in vehicle listings. The dataset includes 659 advertisements from a popular car-selling website, with over 10,000 entities annotated across 15 categories, including MODEL, ENGINE_SPEC, and BATTERY_CAPACITY. Annotation quality was validated through inter-annotator agreement, achieving an average score of 91.5%. We benchmark rule-based extraction, fine-tuned transformer encoders, and large language models. DeBERTa achieves the best performance with a 90% micro-F1 score, outperforming the rule-based baseline (43%) and the strongest large language model (77.8%).

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。