bioRxiv (Bioinfo)
2026-06-23 00:00
DOI:
HASH:6a33eced4806604fb439de9228f370b1
Learning interpretable structural similarity from tandem mass spectra for small molecule analog discovery
Authors:
Abstract
Analog discovery remains a central bottleneck in mass spectrometry-based untargeted metabolomics, as conventional spectral similarity scores poorly reflect molecular structure. We introduce SIMBA, a transformer-based model that infers two interpretable graph-based distances, maximum common edge subgraph and substructure edit distance, directly from tandem mass spectra. SIMBA consistently retrieves structurally closer analogs than existing methods, enabling structure-aware small molecule identification beyond exact spectral matching.