The availability of chemical libraries with millions of compounds makes the process of identifying lead compounds very hard. The identification of these compounds is the backbone step of drug discovery process. Hierarchical clustering algorithms are used for that purpose. One of the most popular hierarchical clustering algorithms that are used in many applications in the drug discovery process is ward clustering algorithm. A main problem with the previous implementations of ward algorithm is its limitation to handle large data sets within a reasonable time and memory resources. In this paper, OpenCL is used to implement ward algorithm. The first two steps of ward (1) proximity matrix computation; (2) finding minimum distance are modified to run in parallel. Four subsets of National Cancer Institute (NCI) dataset are used. The smallest subset contains 500 compounds and largest subset contains 10,000 compounds. The results show that parallel proximity matrix computation saves 92% of time for smallest subset and 99% of time for largest subset. The parallel minimum distance saves 76% of time for smallest subset and 99% of time for largest subset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.