Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm

Raymond, John W.; Gardiner, Eleanor J.; Willett, Peter

doi:10.1021/ci010381f

Cited by 127 publications

(150 citation statements)

References 39 publications

Supporting

Mentioning

149

Contrasting

Unclassified

Order By: Relevance

“…[1][2][3][4] The work of the chemoinformatics research group in Sheffield has always had a strong algorithmic and methodological focus, this reflecting our location in an informatics, rather than a chemical, academic department. We have thus drawn extensively on computational techniques from, e.g., graph theory, [5][6] cluster analysis, [7][8] image processing [9][10] and combinatorial optimisation [11][12] inter alia to design and implement a wide range of chemoinformatics applications. Lynch and Willett [1] and Bishop et al [4] have described Sheffield work in chemoinformatics for the periods 1965-1985 and 1986-2002, respectively.…”

Section: Introductionmentioning

confidence: 99%

Chemoinformatics at the University of Sheffield 2002–2014

Gillet

Holliday

Willett

2015

Molecular Informatics

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

Chemoinformatics at the University of Sheffield 2002–2014

Gillet

Holliday

Willett

2015

Molecular Informatics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Frequent patterns play a critical role in many data mining tasks as they can be used among other to derive association rules [1], act as composite features for classification algorithms [14,56,63,51,22,50,15], cluster the (graph) transactions [1,48,35,36,49,24], and help in determining the similarity between graphs [54,23,42,59,9,49,13,60,66]. Within the context of graphs, the most widely used definition of a pattern is that of a connected subgraph [8,68,32,29,69,30,44] and is the definition that we will use in this paper.…”

Section: Introductionmentioning

confidence: 99%

Finding Frequent Patterns in a Large Sparse Graph*

Kuramochi

Karypis

2005

Data Min Knowl Disc

307

179

View full text Add to dashboard Cite

This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine the number of the edge-disjoint embeddings of a subgraph that are based on approximate and exact maximum independent set computations and use it to prune infrequent subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 100,000 vertices, and significantly outperform previously developed algorithms.

show abstract

“…This provides a natural way of calculating the degree of similarity between a pair of molecules but the NP-complete nature of the maximum common subgraph isomorphism problem has ruled out the large-scale use of MCS-based similarities. We have recently described a new MCS algorithm, called RASCAL, that is sufficiently rapid in execution to permit graph-based similarity searching of large chemical databases 16,17 and that seems to provide a viable complement, or even an alternative, to existing, fingerprint-based approaches to virtual screening 18 .…”

Section: Introductionmentioning

confidence: 99%

Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

Raymond

Blankley

Willett

2003

Journal of Molecular Graphics and Modelling

Self Cite

View full text Add to dashboard Cite

This paper compares several published methods for clustering chemical structures, using both fingerprint-based and graph-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both fingerprint-based and graph-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST method, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures.

show abstract

Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm

Cited by 127 publications

References 39 publications

Chemoinformatics at the University of Sheffield 2002–2014

Chemoinformatics at the University of Sheffield 2002–2014

Finding Frequent Patterns in a Large Sparse Graph*

Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

Contact Info

Product

Resources

About