← 返回大厅
bioRxiv (Bioinfo) 2026-06-15 00:00 DOI: HASH:2333a5a77da9fe0fe4ecd1a020ec0f4a

Biological meaning in protein embedding space is resolution-dependent

摘要 / Abstract

Protein language model embeddings are increasingly used to organise biological sequences, yet how biological meaning is encoded within embedding neighbourhoods remains poorly understood. Using two independent hierarchical enzyme systems, carbohydrate-active enzymes and peptidases, we investigated how biological interpretation changes across embedding organisations aligned to different levels of biological hierarchy. Different embedding organisations give rise to distinct neighbourhood semantics. When aligned to membership-boundary resolution, embeddings robustly separated artefacts and unrelated proteins from members of the target category. However, embeddings aligned to functional-grouping resolution maintained compositional neighbourhood structure for multi-domain proteins spanning more than one functional or catalytic group. Finally, embeddings aligned to local-family resolution recovered compact family-like neighbourhoods, including families withheld from training, while weakening broader membership-boundary and functional-grouping relationships. Moreover, embeddings optimised toward the same level of biological organisation retain different biological relationships depending on optimisation trajectory employed. Together, our results show that proximity in protein embedding space has no fixed biological interpretation. Instead, biological meaning emerges across embedding resolutions through selective preservation of different forms of biological organisation.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。