2016
DOI: 10.3390/metabo6040040
|View full text |Cite
|
Sign up to set email alerts
|

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps

Abstract: Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
85
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 61 publications
(85 citation statements)
references
References 78 publications
0
85
0
Order By: Relevance
“…The cleaned data are less 'noisy' and contain mainly defined features. The latter refers to the 'relevant' ions/true signals (each with a unique m/z and Rt) that are extracted from the raw data, capturing as much usable information as possible . Furthermore, for semantic clarification, the term 'defined features' (specifically in this study) means recorded true MS signals, relevant ionised species (extracted from processing and data scrutiny steps – clean data matrices), which most likely represent 'detected' metabolites.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The cleaned data are less 'noisy' and contain mainly defined features. The latter refers to the 'relevant' ions/true signals (each with a unique m/z and Rt) that are extracted from the raw data, capturing as much usable information as possible . Furthermore, for semantic clarification, the term 'defined features' (specifically in this study) means recorded true MS signals, relevant ionised species (extracted from processing and data scrutiny steps – clean data matrices), which most likely represent 'detected' metabolites.…”
Section: Resultsmentioning
confidence: 99%
“…Before performing PCA, the data were mean‐centered (to put all variables on equal footing) and Pareto‐scaled to adjust for measurement errors . Nonlinear iterative partial least‐squares (NIPALS) algorithm (with a correction factor of 3.0) was the methodology used to handle the missing values, with a default threshold of 50%. A seven‐fold cross‐validation (CV) procedure was applied as a tuning method to build the models; and only the components producing an increase in the prediction ability of the model ( R1 significant components) were retained.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…MVDA tools commonly used in food metabolomics studies include artificial neural networks (ANN), principal component analysis (PCA), orthogonal projection to latent structures-discriminant analysis (OPLS-DA), partial least square discriminant analysis (PLS-DA), principal component regression (PCR), hierarchical cluster analysis (HCA), canonical correlation analysis (CCA) and others [16,38,43]. Detailed strategies, algorithms and explanation on these MVDA techniques have been described in detail elsewhere [24,39,[43][44][45][46].…”
Section: Data Analysis and Treatmentmentioning
confidence: 99%
“…Handling these huge data would require an automated software for quantification and identification [24]. Pretreatment basically involves alignment, normalization, compound identification, centering, transformation, scaling, removing baseline artefacts and peak picking [16,24,38,39], in order to convert the raw data set into a form that can be utilized for subsequent analysis. Succeeding analysis of the cleaned data in food metabolomics studies are majorly done using different chemometric tools, to provide a description and understanding of the variations and/or similarities in the metabolites.…”
Section: Data Analysis and Treatmentmentioning
confidence: 99%