Data clustering, an unsupervised machine learning technique, plays a critical part in the process of drug discovery in chemoinformatics. Researchers have come up with numerous clustering algorithms over the past decades that are well suited to analyze large chemical datasets of high dimensionality. The applications of clustering algorithms can be seen in lead compound selection which is the process of identifying the chemical compound that helps in the treatment of disease and results in the development of a new drug in the drug discovery process. The quantitative structure-property relationship (QSPR) in the drug discovery process identifies the compounds having similar properties using clustering algorithms over the structural descriptors of the chemical compounds. The quantitative structure-activity relationship (QSAR) process uses cluster analysis to identify the empirical relationships between the chemical structure and biological activities among similar compounds. The acute toxicity of the chemical compound is controlled by the chemists in the drug discovery process using cluster analysis. Considering the numerous applications of data clustering in the drug discovery process, in this paper, an improved clustering algorithm ImpClust is proposed to cluster similar compounds based on chemical composition. Five benchmark datasets are considered to evaluate the performance of the proposed ImpClust algorithm. The experimental results obtained are compared with the five commonly used clustering algorithms. A total of five cluster validation indexes (DI-Index, COP-Index, DB-Index, CH-Index and Silhouette Index) are used to evaluate the clusters formed utilizing the different clustering algorithms. The experimental findings show that the proposed ImpClust algorithm achieves a significantly high score for Silhouette Index, DI-Index, and CH-Index whereas for COP-Index and DB-Index the proposed ImpClust algorithm achieves a significantly low score in comparison to the five existing clustering techniques.