A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

Mendez, Kevin; Reinke, Stacey N.; Broadhurst, David

doi:10.1007/s11306-019-1612-4

Cited by 138 publications

(137 citation statements)

References 50 publications

Supporting

Mentioning

132

Contrasting

Order By: Relevance

“…With more substantial amounts of data being produced by these multiplex assays, machine learning tools facilitate reproducible and understandable models of prediction (classification and regression) [77]. These techniques take an entire metabolic snapshot of the metabolome and range from several to hundreds of analytes, to classify the sample in order to arrive at a diagnosis [54,56].…”

Section: Mass Spectrometry Cheminformatics and Machine Learningmentioning

confidence: 99%

Metabolomics to Improve the Diagnostic Efficiency of Inborn Errors of Metabolism

Mordaunt

Cox

Fuller

2020

IJMS

View full text Add to dashboard Cite

Early diagnosis of inborn errors of metabolism (IEM)—a large group of congenital disorders—is critical, given that many respond well to targeted therapy. Newborn screening programs successfully capture a proportion of patients enabling early recognition and prompt initiation of therapy. For others, the heterogeneity in clinical presentation often confuses diagnosis with more common conditions. In the absence of family history and following clinical suspicion, the laboratory diagnosis typically begins with broad screening tests to circumscribe specialised metabolite and/or enzyme assays to identify the specific IEM. Confirmation of the biochemical diagnosis is usually achieved by identifying pathogenic genetic variants that will also enable cascade testing for family members. Unsurprisingly, this diagnostic trajectory is too often a protracted and lengthy process resulting in delays in diagnosis and, importantly, therapeutic intervention for these rare conditions is also postponed. Implementation of mass spectrometry technologies coupled with the expanding field of metabolomics is changing the landscape of diagnosing IEM as numerous metabolites, as well as enzymes, can now be measured collectively on a single mass spectrometry-based platform. As the biochemical consequences of impaired metabolism continue to be elucidated, the measurement of secondary metabolites common across groups of IEM will facilitate algorithms to further increase the efficiency of diagnosis.

show abstract

Section: Mass Spectrometry Cheminformatics and Machine Learningmentioning

confidence: 99%

Metabolomics to Improve the Diagnostic Efficiency of Inborn Errors of Metabolism

Mordaunt

Cox

Fuller

2020

IJMS

View full text Add to dashboard Cite

show abstract

“…Second, model outcomes and resulting interpretations can affected by the quality of the input data. We have previously shown that PLS and ANNs show similar predictive ability, when using the same input data, and that sample size is an important determinant of model stability (Mendez et al 2019c). However, to our knowledge, an extensive comparison of different data cleaning (Broadhurst et al 2018), pre-treatment (van den Berg et al 2006), and imputation (Di Guida et al 2016;Do et al 2018) procedure options has not been performed for ANNs.…”

Section: Discussionmentioning

confidence: 99%

“…While true effectiveness of a model can only be assessed using test data (Westerhuis et al 2008;Xia et al 2013), for small data sets it is dangerous to use a single random data split as the only means of model evaluation, as the random test data set may not accurately represent the training data set (Mendez et al 2019c). An alternative is to use bootstrap resampling.…”

Section: Pls-da Evaluationmentioning

confidence: 99%

“…The computational libraries developed for this study require data to be converted to a standardised format using the tidy data framework (Wickham, 2014). This standardised format has been previously described (Mendez et al 2019b(Mendez et al , 2019c, and allows for the efficient reuse of these workflows for other studies. To demonstrate this, we include the application of the identical workflows and visualisation techniques to a second previously published dataset (Ganna et al 2016) as a supplementary document.…”

Section: Datasetsmentioning

confidence: 99%

“…TensorFlow and PyTorch), and the general success within industry and other research fields, the reintroduction of ANNs warrants renewed investigation. We recently showed that ANNs have similar predictive ability to PLS across multiple diverse metabolomics data sets (Mendez et al 2019c). However, within the domain of metabolomics, if ANNs are to become a truly viable alternative to PLS it will be necessary to develop similar standardised and robust methods for data visualisation, evaluation, and statistical inference (Mendez et al 2019a).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Migrating from partial least squares discriminant analysis to artificial neural networks: a comparison of functionally equivalent visualisation and feature contribution tools using jupyter notebooks

2020

Self Cite

View full text Add to dashboard Cite

Introduction Metabolomics data is commonly modelled multivariately using partial least squares discriminant analysis (PLS-DA). Its success is primarily due to ease of interpretation, through projection to latent structures, and transparent assessment of feature importance using regression coefficients and Variable Importance in Projection scores. In recent years several non-linear machine learning (ML) methods have grown in popularity but with limited uptake essentially due to convoluted optimisation and interpretation. Artificial neural networks (ANNs) are a non-linear projection-based ML method that share a structural equivalence with PLS, and as such should be amenable to equivalent optimisation and interpretation methods. Objectives We hypothesise that standardised optimisation, visualisation, evaluation and statistical inference techniques commonly used by metabolomics researchers for PLS-DA can be migrated to a non-linear, single hidden layer, ANN. Methods We compared a standardised optimisation, visualisation, evaluation and statistical inference techniques workflow for PLS with the proposed ANN workflow. Both workflows were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks on GitHub. Results The migration of the PLS workflow to a non-linear, single hidden layer, ANN was successful. There was a similarity in significant metabolites determined using PLS model coefficients and ANN Connection Weight Approach. Conclusion We have shown that it is possible to migrate the standardised PLS-DA workflow to simple non-linear ANNs. This result opens the door for more widespread use and to the investigation of transparent interpretation of more complex ANN architectures.

show abstract

Data‐Driven Compound Identification in Atmospheric Mass Spectrometry

Sandström,

Rissanen,

Rousu

et al. 2023

Advanced Science

View full text Add to dashboard Cite

Aerosol particles found in the atmosphere affect the climate and worsen air quality. To mitigate these adverse impacts, aerosol particle formation and aerosol chemistry in the atmosphere need to be better mapped out and understood. Currently, mass spectrometry is the single most important analytical technique in atmospheric chemistry and is used to track and identify compounds and processes. Large amounts of data are collected in each measurement of current time‐of‐flight and orbitrap mass spectrometers using modern rapid data acquisition practices. However, compound identification remains a major bottleneck during data analysis due to lacking reference libraries and analysis tools. Data‐driven compound identification approaches could alleviate the problem, yet remain rare to non‐existent in atmospheric science. In this perspective, the authors review the current state of data‐driven compound identification with mass spectrometry in atmospheric science and discuss current challenges and possible future steps toward a digital era for atmospheric mass spectrometry.

show abstract

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

Cited by 138 publications

References 50 publications

Metabolomics to Improve the Diagnostic Efficiency of Inborn Errors of Metabolism

Metabolomics to Improve the Diagnostic Efficiency of Inborn Errors of Metabolism

Migrating from partial least squares discriminant analysis to artificial neural networks: a comparison of functionally equivalent visualisation and feature contribution tools using jupyter notebooks

Data‐Driven Compound Identification in Atmospheric Mass Spectrometry

Contact Info

Product

Resources

About