2018
DOI: 10.1038/s41598-017-19120-0
|View full text |Cite
|
Sign up to set email alerts
|

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

Abstract: Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
415
1
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 463 publications
(466 citation statements)
references
References 30 publications
1
415
1
1
Order By: Relevance
“…A typical metabolomic dataset may contain 20% of missing data in up to 80% of all variables . Missingness can occur from analytical, computational, or biological causes, but, regardless of origin, a principled strategy to deal with them is required. Features showing a high degree of missingness can, for example, be removed from the data matrix whereas remaining missing values can be imputed, i.e.…”
Section: Data Visualization Preprocessing and Analysismentioning
confidence: 99%
“…A typical metabolomic dataset may contain 20% of missing data in up to 80% of all variables . Missingness can occur from analytical, computational, or biological causes, but, regardless of origin, a principled strategy to deal with them is required. Features showing a high degree of missingness can, for example, be removed from the data matrix whereas remaining missing values can be imputed, i.e.…”
Section: Data Visualization Preprocessing and Analysismentioning
confidence: 99%
“…To simulate measurements missing due to instrument limits of detection, a percentage of the lowest abundance values for each metabolite were identified for potential removal, similar to cutoffs used in previous studies (Shah et al, 2017; Wei et al, 2018b). Of the values identified, the lowest 80% abundance values were removed (representing values below a hypothetical limit of detection).…”
Section: Methodsmentioning
confidence: 99%
“…Importantly, a sample can be a neighbor only if it has a measured value for the metabolite missing in the target sample. (Alternatively, nearest neighbor metabolites can be found instead of samples (Armitage et al, 2015; Gromski et al, 2014; Wei et al, 2018b)). A weighted combination of the corresponding values for the missing metabolite in the nearest neighbors is used as the imputed value.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations