2009
DOI: 10.2174/138620709788167962
|View full text |Cite
|
Sign up to set email alerts
|

Performance of Machine Learning Methods for Ligand-Based Virtual Screening

Abstract: Computational screening of compound databases has become increasingly popular in pharmaceutical research. This review focuses on the evaluation of ligand-based virtual screening using active compounds as templates in the context of drug discovery. Ligand-based screening techniques are based on comparative molecular similarity analysis of compounds with known and unknown activity. We provide an overview of publications that have evaluated different machine learning methods, such as support vector machines, deci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
30
0
4

Year Published

2010
2010
2016
2016

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(34 citation statements)
references
References 0 publications
0
30
0
4
Order By: Relevance
“…The advantage is that such representation is highly storage-efficient, and the time-consuming operation of comparison of two molecular graphs reduces to a highly time-efficient operation of bitstring comparison. There exists a wide variety of molecular fingerprinting methods which mainly differ in the type of topologies and physico-chemical features they encode [14][15][16][17]. Usually the entire molecule is not encoded all at once, instead it is fragmented into small parts called fragments (not necessarily disjunctive), and these fragments are encoded one by one.…”
Section: Structural Elements Representation and Storagementioning
confidence: 99%
“…The advantage is that such representation is highly storage-efficient, and the time-consuming operation of comparison of two molecular graphs reduces to a highly time-efficient operation of bitstring comparison. There exists a wide variety of molecular fingerprinting methods which mainly differ in the type of topologies and physico-chemical features they encode [14][15][16][17]. Usually the entire molecule is not encoded all at once, instead it is fragmented into small parts called fragments (not necessarily disjunctive), and these fragments are encoded one by one.…”
Section: Structural Elements Representation and Storagementioning
confidence: 99%
“…The GA seeks to identify those weights that produce the best possible ranking of the molecules in a dataset, and hence to estimate an upper-bound to the effectiveness of virtual screening possible using the substructural analysis approach. The basic idea is illustrated in Figure 1 using a training-set containing three molecules M 1-3 , each of which is represented by a fingerprint encoding the presence or absence of five fragments F [1][2][3][4][5] .…”
Section: The Genetic Algorithmmentioning
confidence: 99%
“…An initial population of possible solutions is generated with the initial weights W 1 -W 5 being assigned by a randomnumber generator that has been primed in this simple example to generate integer weights in the range 0-10. In the example, the population contains six chromosomes, C [1][2][3][4][5][6] , and the initial population is shown in Figure 1b. Each chromosome is then used to compute the sum-of-weights for each molecule, as shown in Figure 1c.…”
Section: The Genetic Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…Such publicly available reference sets may serve as a valuable basis for method benchmarking. [17] As a personal note on benchmarking of virtual screening methods, it is arguable whether a reported improvement of a few percent in prediction accuracy or 'hit' enrichment is actually meaningful for prospective applications of a machine-learning model, keeping a certain degree of unavoidable overall data inaccuracy in mind. A further concern addresses the relevance of artificially 'idealized' benchmarking data for practical drug discovery, [15] as in reality all data sets (e.g., screening compound pools, supplier catalogues, virtual combinatorial libraries) are biased, and bioactive compounds have the capacity to bind to multiple targets.…”
mentioning
confidence: 99%