← 返回大厅
arXiv (CS.CL) 2026-06-16 12:00 DOI: arXiv:2412.21036

GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models

摘要 / Abstract

Geometric shapes play important roles in both physical world and human cognition. While multimodal large language models (MLLMs) have made significant advancements in visual understanding, their abilities to recognize geometric shapes and their spatial relationships, which we term geometric perception, are not explicitly and systematically explored. To address this gap, we introduce GePBench, a novel benchmark specifically designed to assess the geometric perception capabilities of MLLMs. Our extensive evaluations reveal that even the current state-of-the-art MLLMs exhibit significant deficiencies in geometric perception tasks. Furthermore, we show that models trained with GePBench data demonstrate considerable improvements on a wide range of downstream tasks, highlighting the critical role of geometric perception in enabling advanced multimodal applications. Our code and datasets are available at \href{https://github.com/Changhao-Xiang/GePBench}{https://github.com/Changhao-Xiang/GePBench}.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。