<p><b>Background</b>: Molecular fingerprints are essential
cheminformatics tools for virtual screening and mapping chemical space. Among
the different types of fingerprints, substructure fingerprints perform best for
small molecules such as drugs, while atom-pair fingerprints are preferable for
large molecules such as peptides. However, no available fingerprint achieves
good performance on both classes of molecules.</p>
<p><b>Results</b>: Here we set out to design a new fingerprint suitable for both small and
large molecules by combining substructure and atom-pair concepts. Our quest
resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a
diameter of four bonds (MAP4). In this fingerprint the circular substructures
with radii of <i>r</i> = 1 and <i>r </i>= 2 bonds around each atom in an
atom-pair are written as two pairs of SMILES, each pair being combined with the
topological distance separating the two central atoms. These so-called
atom-pair molecular shingles are hashed, and the resulting set of hashes is
MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all
other fingerprints on an extended benchmark that combines the Riniker and
Landrum small molecule benchmark with a peptide benchmark recovering BLAST
analogs from either scrambled or point mutation analogs. MAP4 furthermore
produces well-organized chemical space tree-maps (TMAPs) for databases as
diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database
(HMBD), and differentiates between all metabolites in HMBD, over 70 % of which
are indistinguishable from their nearest neighbor using substructure
fingerprints. </p>
<b>Conclusion</b>: MAP4 is a new
molecular fingerprint suitable for drugs, biomolecules, and the metabolome and
can be adopted as a universal fingerprint to describe and search chemical
space. The source code is available at <a href="https://github.com/reymond-group/map4">https://github.com/reymond-group/map4</a> and interactive MAP4
similarity search tools and TMAPs for various databases are accessible at <a href="http://map-search.gdb.tools/">http://map-search.gdb.tools/</a> and <a href="http://tm.gdb.tools/map4/">http://tm.gdb.tools/map4/</a>.<a href="http://tm.gdb.tools/map4/"></a>