2019
DOI: 10.1007/s11306-019-1612-4
|View full text |Cite
|
Sign up to set email alerts
|

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

Abstract: IntroductionMetabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to mode… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
132
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 138 publications
(137 citation statements)
references
References 50 publications
5
132
0
Order By: Relevance
“…With more substantial amounts of data being produced by these multiplex assays, machine learning tools facilitate reproducible and understandable models of prediction (classification and regression) [77]. These techniques take an entire metabolic snapshot of the metabolome and range from several to hundreds of analytes, to classify the sample in order to arrive at a diagnosis [54,56].…”
Section: Mass Spectrometry Cheminformatics and Machine Learningmentioning
confidence: 99%
“…With more substantial amounts of data being produced by these multiplex assays, machine learning tools facilitate reproducible and understandable models of prediction (classification and regression) [77]. These techniques take an entire metabolic snapshot of the metabolome and range from several to hundreds of analytes, to classify the sample in order to arrive at a diagnosis [54,56].…”
Section: Mass Spectrometry Cheminformatics and Machine Learningmentioning
confidence: 99%
“…Second, model outcomes and resulting interpretations can affected by the quality of the input data. We have previously shown that PLS and ANNs show similar predictive ability, when using the same input data, and that sample size is an important determinant of model stability (Mendez et al 2019c). However, to our knowledge, an extensive comparison of different data cleaning (Broadhurst et al 2018), pre-treatment (van den Berg et al 2006), and imputation (Di Guida et al 2016;Do et al 2018) procedure options has not been performed for ANNs.…”
Section: Discussionmentioning
confidence: 99%
“…While true effectiveness of a model can only be assessed using test data (Westerhuis et al 2008;Xia et al 2013), for small data sets it is dangerous to use a single random data split as the only means of model evaluation, as the random test data set may not accurately represent the training data set (Mendez et al 2019c). An alternative is to use bootstrap resampling.…”
Section: Pls-da Evaluationmentioning
confidence: 99%
See 2 more Smart Citations