General-purpose chatbots outperform clinical AI tools on physicians’ real-world questions

作者: 未知作者

摘要 / Abstract

Specialized clinical AI tools are entering medical practice with little independent testing. In a head-to-head evaluation across two public benchmarks and real questions from physicians, three general-purpose frontier large language models outperformed two leading clinical AI tools, which performed no better than Google search AI overview.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。