2009
DOI: 10.1007/s10822-009-9285-0
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and use of fragment-occurrence data in similarity-based virtual screening

Abstract: Current systems for similarity-based virtual screening use similarity measures in which all the fragments in a fingerprint contribute equally to the calculation of structural similarity. This paper discusses the weighting of fragments on the basis of their frequencies of occurrence in molecules. Extensive experiments with sets of active molecules from the MDL Drug Data Report and the World of Molecular Bioactivity databases, using fingerprints encoding Tripos holograms, Pipeline Pilot ECFC_4 circular substruct… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
30
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
3

Relationship

5
3

Authors

Journals

citations
Cited by 28 publications
(31 citation statements)
references
References 52 publications
1
30
0
Order By: Relevance
“…As another example, Duan et al note that fingerprints can often be implemented in multiple ways, with their extensive comparison of similarity methods for virtual screening involving 11 different parameterisations of the atoms involved in each substructural fragment encoded in a fingerprint [22]; the comparison here has used two popular representations (ECFP4 and FCFP4) in the Pipeline Pilot software to exemplify the use of alternative approaches to atom-typing. Other factors that may affect the effectiveness of fingerprint implementations include: the length of the fingerprint that is used, especially if hashing techniques are employed that can result in substantial numbers of collisions [21]; and whether incidence or occurrence data is used, i.e., whether the fingerprint encodes merely the presence of a fragment, its frequency of occurrence, or some standardised form of the latter [24].…”
Section: Methodsmentioning
confidence: 99%
“…As another example, Duan et al note that fingerprints can often be implemented in multiple ways, with their extensive comparison of similarity methods for virtual screening involving 11 different parameterisations of the atoms involved in each substructural fragment encoded in a fingerprint [22]; the comparison here has used two popular representations (ECFP4 and FCFP4) in the Pipeline Pilot software to exemplify the use of alternative approaches to atom-typing. Other factors that may affect the effectiveness of fingerprint implementations include: the length of the fingerprint that is used, especially if hashing techniques are employed that can result in substantial numbers of collisions [21]; and whether incidence or occurrence data is used, i.e., whether the fingerprint encodes merely the presence of a fragment, its frequency of occurrence, or some standardised form of the latter [24].…”
Section: Methodsmentioning
confidence: 99%
“…In conventional, unweighted fingerprints x i =1 experiments using sets of bioactive molecules from the MDL Drug Data Report (MDDR) and World of Molecular Bioactivity (WOMBAT) databases. [23][24] The first type of weighting, frequency weighting, is based on the assumption that a fragment that occurs several times in a molecule should make a greater contribution to the overall degree of similarity than if it occurs just once, and that this contribution should be still greater if that fragment also occurs multiple times in the molecule with which it is being compared. Arif et al considered several different ways of using the occurrence information, as detailed in the left-hand side of Table 1, and concluded that the best screening results were obtained by using the square root of the occurrence frequencies.…”
Section: Similarity-based Virtual Screeningmentioning
confidence: 99%
“…Arif et al considered several different ways of using the occurrence information, as detailed in the left-hand side of Table 1, and concluded that the best screening results were obtained by using the square root of the occurrence frequencies. [23] The effect of this scheme is to lessen the contribution of the more generic fragments that can occur relatively frequently within molecules, and that can thus yield high values if raw occurrence counts are used without some form of normalisation. Turning to the second type of weighting, inverse frequency weighting, the basic assumption here is that two molecules that share an infrequently occurring feature (such as a rare heterocycle) should be considered as being more similar to each other than if they share a feature (such as a benzene ring) that occurs very frequently throughout the database that is being searched.…”
Section: Similarity-based Virtual Screeningmentioning
confidence: 99%
“…However, we are going to use only the MDDR database in our experiment. The data is said to be qualitative in MDDR database and a molecule in it is said to be inactive if it is not in the case, whereby the molecule is not exhibiting any specific activity [17].…”
Section: A Chemical Databasementioning
confidence: 99%