2019
DOI: 10.3390/molecules24112115
|View full text |Cite
|
Sign up to set email alerts
|

Error Tolerance of Machine Learning Algorithms across Contemporary Biological Targets

Abstract: Machine learning continues to make strident advances in the prediction of desired properties concerning drug development. Problematically, the efficacy of machine learning in these arenas is reliant upon highly accurate and abundant data. These two limitations, high accuracy and abundance, are often taken together; however, insight into the dataset accuracy limitation of contemporary machine learning algorithms may yield insight into whether non-bench experimental sources of data may be used to generate useful… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
28
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 14 publications
(28 citation statements)
references
References 49 publications
0
28
0
Order By: Relevance
“…29,30 Additionally, the Bayesian classifier has the additional property of being highly noise tolerant as compared to other machine learning methods. 31 Another advantage of using the Bayesian classifier applied in FRESH is that it uses 2D extended connectivity fingerprints (ECFP) making it suitable for use on large data sets due to its rapid speed. 32 The Bayesian classifier is therefore applied ahead of more computationally expensive methods like docking and binding free energy calculations in the task-stream of FRESH.…”
Section: Resultsmentioning
confidence: 99%
“…29,30 Additionally, the Bayesian classifier has the additional property of being highly noise tolerant as compared to other machine learning methods. 31 Another advantage of using the Bayesian classifier applied in FRESH is that it uses 2D extended connectivity fingerprints (ECFP) making it suitable for use on large data sets due to its rapid speed. 32 The Bayesian classifier is therefore applied ahead of more computationally expensive methods like docking and binding free energy calculations in the task-stream of FRESH.…”
Section: Resultsmentioning
confidence: 99%
“…Information for the particular protein of interest might be limited, resulting in not much-extrapolated data. Free Energy Perturbation method is a platform where biological information regarding the protein is generated based on computational screening [ 61 ]. Data gathered from this method is utilized for training algorithms; however, not all the information is collected from a wet lab, rather computer-generated prediction is utilized.…”
Section: Limitationsmentioning
confidence: 99%
“…The accuracy of the training data might be lower than anticipated. Even though algorithms discussed in this review have a higher threshold for minimizing errors, there are still some categorical errors from training sets [ 61 ].…”
Section: Limitationsmentioning
confidence: 99%
“…Nevertheless, there are trade-offs as unrepresentative data can lead to 'over-fitting' and poor generalisability, which is when a ML algorithm performs very well on the training data but poorly on other data. ML is also typically understood to be tolerant of errors in the data, with some approaches showing acceptable performance despite error rates up to 39 percent [1,2]. ML methods usually require substantial calculation, but this can be managed through careful selection of algorithms and setups [3].…”
Section: Introductionmentioning
confidence: 99%
“…Broadly, ML uses computational methods and algorithms to learn to perform tasks, such as categorisation, decision making or anomaly detection through experience and without explicit instruction. ML is most effective in situations where non-computational means or conventional algorithms are impractical or impossible, such as when the data are vast, complex, highly variable and/or full of errors [1,2]. Thus, ML is useful for analysing natural language, images, or other types of complex and messy data that are now available in ever-growing and impractically large volumes.…”
mentioning
confidence: 99%