PFClust: a novel parameter free clustering algorithm

Mavridis, Lazaros; Nath, Neetika; Mitchell, John B. O.

doi:10.1186/1471-2105-14-213

Cited by 16 publications

(20 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2 shows an example family from ChEMBL, one of the Androgen Receptor families (ChEMBL1871), with a number of different clusters of compounds. Splitting such a family into smaller groups based on ligand structure will allow us to identify the different sets of ligands; therefore PFClust [24] (brief description in Additional file 1) was applied to all the filtered ChEMBL families. We selected the PFClust algorithm because it is a parameter free clustering algorithm and does not require any kind of parameter tuning.…”

Section: Methodsmentioning

confidence: 99%

Predicting the protein targets for athletic performance-enhancing substances

Mavridis

Mitchell

2013

J Cheminform

Self Cite

View full text Add to dashboard Cite

BackgroundThe World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more performance-enhancing pharmacological actions in an automated, fast and cost effective way. Here, we use experimental data derived from the ChEMBL database (~7,000,000 activity records for 1,300,000 compounds) to build a database model that takes into account both structure and experimental information, and use this database to predict both on-target and off-target interactions between these molecules and targets relevant to doping in sport.ResultsThe ChEMBL database was screened and eight well populated categories of activities (Ki, Kd, EC50, ED50, activity, potency, inhibition and IC50) were used for a rule-based filtering process to define the labels “active” or “inactive”. The “active” compounds for each of the ChEMBL families were thereby defined and these populated our bioactivity-based filtered families. A structure-based clustering step was subsequently performed in order to split families with more than one distinct chemical scaffold. This produced refined families, whose members share both a common chemical scaffold and bioactivity against a common target in ChEMBL.ConclusionsWe have used the Parzen-Rosenblatt machine learning approach to test whether compounds in ChEMBL can be correctly predicted to belong to their appropriate refined families. Validation tests using the refined families gave a significant increase in predictivity compared with the filtered or with the original families. Out of 61,660 queries in our Monte Carlo cross-validation, belonging to 19,639 refined families, 41,300 (66.98%) had the parent family as the top prediction and 53,797 (87.25%) had the parent family in the top four hits. Having thus validated our approach, we used it to identify the protein targets associated with the WADA prohibited classes. For compounds where we do not have experimental data, we use their computed patterns of interaction with protein targets to make predictions of bioactivity. We hope that other groups will test these predictions experimentally in the future.

show abstract

Section: Methodsmentioning

confidence: 99%

Predicting the protein targets for athletic performance-enhancing substances

Mavridis

Mitchell

2013

J Cheminform

Self Cite

View full text Add to dashboard Cite

show abstract

“…This leads to a set of refined families, each consisting of a group of molecules, which share similar chemical structure and bioactivity. The refined families of the ChEMBL dataset will allow us to identify the different sets of ligands [56,58,61].…”

Section: Filtered and Refined Families Of The Chembl Datasetmentioning

confidence: 99%

“…Using these position vectors for each compound, we calculated the Euclidean distances between the resulting points and a similarity matrix was created. Finally, we clustered the vectors using PFClust [61].…”

Section: Identifying the Off-targets Of The Novel Multipotent Compoundsmentioning

confidence: 99%

“…This process generates bioactivity based filtered families. Our recently developed PFClust clustering [61] was applied to all the filtered ChEMBL families, which subdivided each family into smaller groups based both on ligand structure and their proven activity on a given protein target [56,58]. The compounds were clustered on the basis of their chemical structures, described by circular fingerprints (CFPs) [57].…”

Section: Filtered and Refined Families Of The Chembl Datasetmentioning

confidence: 99%

See 1 more Smart Citation

Predicting targets of compounds against neurological diseases using cheminformatic methodology

Nikolic

Mavridis

Aguilera

et al. 2014

J Comput Aided Mol Des

Self Cite

View full text Add to dashboard Cite

Recently developed multi-targeted ligands are novel drug candidates able to interact with monoamine oxidase A and B; acetylcholinesterase and butyrylcholinesterase; or with histamine N-methyltransferase and histamine H3-receptor (H3R). These proteins are drug targets in the treatment of depression, Alzheimer's disease, obsessive disorders, and Parkinson's disease. A probabilistic method, the Parzen-Rosenblatt window approach, was used to build a "predictor" model using data collected from the ChEMBL database. The model can be used to predict both the primary pharmaceutical target and off-targets of a compound based on its structure. Molecular structures were represented based on the circular fingerprint methodology. The same approach was used to build a "predictor" model from the DrugBank dataset to determine the main pharmacological groups of the compound. The study of off-target interactions is now recognised as crucial to the understanding of both drug action and toxicology. Primary pharmaceutical targets and off-targets for the novel multi-target ligands were examined by use of the developed cheminformatic method. Several multi-target ligands were selected for further study, as compounds with possible additional beneficial pharmacological activities. The cheminformatic targets identifications were in agreement with four 3D-QSAR (H3R/D1R/D2R/5-HT2aR) models and by in vitro assays for serotonin 5-HT1a and 5-HT2a receptor binding of the most promising ligand (71/MBA-VEG8)

show abstract

“…Moreover, the abovementioned evaluation indices are distance-based measures; therefore, they can only evaluate the qualities of spherical clusters and cannot be used for arbitrary-shaped clusters. In [15], Mavridis et al proposed the algorithm PFClust (Parameter Free Clustering). e term "parameter free" means that the algorithm can automatically determine the number of clusters without requiring any user-defined parameters.…”

Section: Introductionmentioning

confidence: 99%

A New Clustering Algorithm and Its Application in Assessing the Quality of Underground Water

Vovan

Nguyen-Hai

Tat-Hong

et al. 2020

Scientific Programming

View full text Add to dashboard Cite

Cluster analysis, which is to partition a dataset into groups so that similar elements are assigned to the same group and dissimilar elements are assigned to different ones, has been widely studied and applied in various fields. The two challenging tasks in clustering are determining the suitable number of clusters and generating clusters of arbitrary shapes. This paper proposes a new concept of “epsilon radius neighbors” which plays an essential role in the cluster-forming process, thereby determining both the number of clusters and the shape of clusters, automatically. Based on “epsilon radius neighbors,” a new clustering algorithm in which the epsilon radius value is adapted to the characteristics of each cluster in the current partition is proposed. Recently, clustering has been widely applied in environmental applications, including underground water quality monitoring. However, the existing studies have simply applied conventional clustering techniques, in which the abovementioned two challenging tasks have not been solved already. Therefore, in this paper, the proposed clustering algorithm is applied in assessing the underground water quality in Phu My Town, Ba Ria-Vung Tau Province, Vietnam. The experimental results on benchmark datasets demonstrate the effectiveness of the proposed algorithm. For the quality of underground water, the new algorithm results in four clusters with different characteristics. Through this application, we found that the new algorithm might provide valuable reference information for underground water management.

show abstract

PFClust: a novel parameter free clustering algorithm

Cited by 16 publications

References 27 publications

Predicting the protein targets for athletic performance-enhancing substances

Predicting the protein targets for athletic performance-enhancing substances

Predicting targets of compounds against neurological diseases using cheminformatic methodology

A New Clustering Algorithm and Its Application in Assessing the Quality of Underground Water

Contact Info

Product

Resources

About