2022
DOI: 10.1186/s12859-022-04659-1
|View full text |Cite
|
Sign up to set email alerts
|

Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics

Abstract: When analyzing large datasets from high-throughput technologies, researchers often encounter missing quantitative measurements, which are particularly frequent in metabolomics datasets. Metabolomics, the comprehensive profiling of metabolite abundances, are typically measured using mass spectrometry technologies that often introduce missingness via multiple mechanisms: (1) the metabolite signal may be smaller than the instrument limit of detection; (2) the conditions under which the data are collected and proc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…There has been much discussion in the literature about the relevance of Donald Rubin's MAR, MCAR and MNAR classification for proteomics data [26,15,16,27,28,17,29,30]. Our analysis shows unambiguously that intensity is a strong predictor of the rate of missingness.…”
Section: Discussionmentioning
confidence: 53%
“…There has been much discussion in the literature about the relevance of Donald Rubin's MAR, MCAR and MNAR classification for proteomics data [26,15,16,27,28,17,29,30]. Our analysis shows unambiguously that intensity is a strong predictor of the rate of missingness.…”
Section: Discussionmentioning
confidence: 53%
“…More generally, the slope of the DPC quantifies the amount of information that can be theoretically extracted from the missing value frequencies for estimating peptide intensities or assessing differential expression. Some authors have proposed classifying individual missing values as MAR or left-censored ( Lazar et al 2016 ; Wei et al 2018 ; Liu and Dongre 2021 ; Dekermanjian et al 2022 ) but our work shows that the same DPC can be applied to all values.…”
Section: Discussionmentioning
confidence: 92%
“…There has been much discussion in the literature about the relevance of Donald Rubin’s missing at random (MAR) and missing not at random (MNAR) classification for mass spectrometry data ( Karpievitch et al 2012 ; Webb-Robertson et al 2015 ; Lazar et al 2016 ; Wang et al 2020 ; Gardner and Freitas 2021 ; Liu and Dongre 2021 ; Dekermanjian et al 2022 ; Shen et al 2022 ). Our work shows that missing intensities are MNAR but that the dependence of missing value frequency on intensity is gradual.…”
Section: Discussionmentioning
confidence: 99%
“…It is also challenging to identify meaningful variables from the large pool. In this regard, ML approaches can be applied to reduce the feature size by projecting the high dimensional data to a lower dimensional space ( Hira and Gillies, 2015 ; Dekermanjian et al, 2022 ; Faquih et al, 2020 ). Some examples of such ML methods are non-negative matrix factorization (NMF), principal component analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE), and autoencoder (a type of ANN), with the last two especially suited for analysing non-linear data.…”
Section: Modelling Approachesmentioning
confidence: 99%