← Back to Lobby
arXiv (CS.CL) 2026-06-17 12:00 DOI: arXiv:2606.18158

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

Abstract

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.

Peer Discussions

Sign in with a scholar account to comment or like.

Sign in now

No discussions yet.