← 返回大厅
arXiv (CS.CL) 2026-06-11 12:00 DOI: arXiv:2603.24080

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

摘要 / Abstract

Benchmarks like MMLU suggest flagship language models approach factuality saturation above 90\%. LLMpedia shows this picture is incomplete. We materialize ${\sim}$1.3M encyclopedia articles entirely from parametric memory across three model families, then audit every claim against Wikipedia and curated web evidence. For \texttt{gpt-5-mini}, the verifiable true rate is 68.4\% on Wikipedia-covered subjects - more than 21\,pp below MMLU - and the gap is driven by unverifiability (30.5\%), not refutation (1.2\%). Beyond Wikipedia, frontier articles audited against curated web evidence reach 57.6\%; Wikipedia covers only 56.7\% of model-surfaced subjects, and three model families overlap in just 7.3\% of subject choices. In a retrieval-trap benchmark inspired by prior analysis of Grokipedia, LLMpedia is more factual at roughly half the textual similarity to Wikipedia. Every prompt, article, and verdict is released. Data, code, interface: https://llmpedia.net.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。