Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.