2020
DOI: 10.1021/acs.jcim.0c00393
|View full text |Cite
|
Sign up to set email alerts
|

Benchmark on Indexing Algorithms for Accelerating Molecular Similarity Search

Abstract: Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 60 publications
0
4
0
Order By: Relevance
“…The number of layers l of each node is chosen at random, and the probability distribution decays exponentially. By combining Word2vec embedding and HNSW, high speed and accuracy are achieved, which reduces the false-negative ratio and facilitates compound identification. …”
Section: Results and Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The number of layers l of each node is chosen at random, and the probability distribution decays exponentially. By combining Word2vec embedding and HNSW, high speed and accuracy are achieved, which reduces the false-negative ratio and facilitates compound identification. …”
Section: Results and Discussionmentioning
confidence: 99%
“…To evaluate the potential mechanism of the effects of selenium on Fu tea, we utilized an untargeted metabolomics strategy with selenium-enriched Fu tea (four final concentrations: 0.0, 0.6, 1.2, and 2.4 mg kg –1 ) (Figure ). Compound identification matching was performed by analyzing the mass spectral signature data set of selenium-enriched Fu tea obtained by UHPLC-Q-Orbitalrap HRMS–MS/MS in combination with a multimillion computer library. , Extended connectivity fingerprints of f- CHEMBL and f- NIST molecules served as inputs to neural electron–ionization mass spectrometry (NEIMS) to generate predicted spectra, which were embedded by Word2vec spectral embedding and hierarchical navigational small world (HNSW) graphs to accomplish the spectral matching with high speed and high accuracy. , …”
Section: Results and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…A notable advancement in this domain has been the widespread adoption of learning-based embedding models, which leverage high-dimensional vector representations to enable effective and efficient analysis and search of unstructured data [37,61]. High-dimensional Vector Similarity Search (HVSS) is a critical challenge in many domains, such as databases [25,68], information retrieval [28,32], recommendation systems [19,54], scientific computing [51,78], and large language models (LLMs) [7,12,44]. The computational complexity associated with exact query answering in HVSS has spurred recent research efforts toward developing approximate search methods [25,49,68].…”
Section: Introductionmentioning
confidence: 99%