2022
DOI: 10.1038/s41598-022-04938-0
|View full text |Cite
|
Sign up to set email alerts
|

Comparative assessment and novel strategy on methods for imputing proteomics data

Abstract: Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(24 citation statements)
references
References 36 publications
0
24
0
Order By: Relevance
“…There has been much discussion in the literature about the relevance of Donald Rubin’s missing at random (MAR) and missing not at random (MNAR) classification for mass spectrometry data ( Karpievitch et al 2012 ; Webb-Robertson et al 2015 ; Lazar et al 2016 ; Wang et al 2020 ; Gardner and Freitas 2021 ; Liu and Dongre 2021 ; Dekermanjian et al 2022 ; Shen et al 2022 ). Our work shows that missing intensities are MNAR but that the dependence of missing value frequency on intensity is gradual.…”
Section: Discussionmentioning
confidence: 99%
“…There has been much discussion in the literature about the relevance of Donald Rubin’s missing at random (MAR) and missing not at random (MNAR) classification for mass spectrometry data ( Karpievitch et al 2012 ; Webb-Robertson et al 2015 ; Lazar et al 2016 ; Wang et al 2020 ; Gardner and Freitas 2021 ; Liu and Dongre 2021 ; Dekermanjian et al 2022 ; Shen et al 2022 ). Our work shows that missing intensities are MNAR but that the dependence of missing value frequency on intensity is gradual.…”
Section: Discussionmentioning
confidence: 99%
“…For each missing variable numbered from 1 to M , RMSE or NRMSE is ranked among all imputation methods. The final value of SOR$SOR$ is the sum of total ranks for each missing variable [59, 117]. SORbadbreak=i=1MRanki()RMSE0.33emor0.33emNRMSE$$\begin{equation} {\textit{SOR}}=\sum _{i=1}^{M}{\textit{Rank}}_{i}\left({\textit{RMSE\ or\ NRMSE}}\right) \end{equation}$$…”
Section: Imputation Methodsmentioning
confidence: 99%
“…For each missing variable numbered from 1 to M, RMSE or NRMSE is ranked among all imputation methods. The final value of SOR is the sum of total ranks for each missing variable [59,117].…”
Section: Direct Evaluationmentioning
confidence: 99%
“…Differences in detectability can also be random and result in lower signal-to-noise ratios (e.g., in proteomics data [67]). Mathematical modeling of persistent detection biases has been proposed as a first step to identify where biases arise and to quantify them in metagenomic data [68], while latent variable modeling has been proposed for estimating missing values in proteomics data [69].…”
Section: Species Richness (S)mentioning
confidence: 99%