2021
DOI: 10.1038/s41598-021-81279-4
|View full text |Cite
|
Sign up to set email alerts
|

A comparative study of evaluating missing value imputation methods in label-free proteomics

Abstract: The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with differen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
77
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 86 publications
(80 citation statements)
references
References 26 publications
3
77
0
Order By: Relevance
“…As with any imputation tools, the accuracy will be limited by the correlation structures, and in general the number of features relative to the sample size. For these and other reasons, this tool is not designed for genomic imputation (Schurz et al, 2019) or for proteomics data (Jin et al, 2021), or other areas with well-understood biological correlation structures. However, the ease of use and seamless interface for using multiple imputation methods makes our approach a useful approach in a variety of analysis pipelines.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…As with any imputation tools, the accuracy will be limited by the correlation structures, and in general the number of features relative to the sample size. For these and other reasons, this tool is not designed for genomic imputation (Schurz et al, 2019) or for proteomics data (Jin et al, 2021), or other areas with well-understood biological correlation structures. However, the ease of use and seamless interface for using multiple imputation methods makes our approach a useful approach in a variety of analysis pipelines.…”
Section: Discussionmentioning
confidence: 99%
“…Many advanced analysis methods, such as machine learning, require a complete dataset, so imputing missing data enables researchers to apply statistical and computational association methods that would otherwise be unavailable. Missing data imputation methods are considered standard in areas such as genetic association (Schurz et al, 2019) and proteomics (Jin et al, 2021), where correlation structures are strong. For electronic health records, the need for imputation methods have more recently realized (Jazayeri et al, 2020), and the use of imputation shown to improve prediction accuracy (Beaulieu-Jones et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…The sources of missing values range from tryptic miscleavages to ion suppression in the mass spectrometer, and improper MS/MS fragmentation [5]. Because LFQ data contain a relatively high percentage of missing values, multiple approaches for the imputation of missing values in proteomics data have been proposed [5][6][7][8]. However, there is no unified consensus on a best approach for imputing missing values in proteomics data and there has been little discussion about applying Multiple Imputation (MI) methods [5][6][7][8][9][10].…”
Section: Introductionmentioning
confidence: 99%
“…Because LFQ data contain a relatively high percentage of missing values, multiple approaches for the imputation of missing values in proteomics data have been proposed [5][6][7][8]. However, there is no unified consensus on a best approach for imputing missing values in proteomics data and there has been little discussion about applying Multiple Imputation (MI) methods [5][6][7][8][9][10]. MI methods are often implemented to account for the uncertainty in the prediction of the imputed values, whereas Single Imputation (SI) methods treat the predicted values as if they were true values in downstream association analysis.…”
Section: Introductionmentioning
confidence: 99%
“…But to our best knowledge, in the metaproteomics contex, the treatment of missing values mostly relies on imputation (Tang et al, 2020a). A large number of imputation methods for proteomics or metaproteomics have been proposed in literature (R package NAguideR, Jin et al (2021)), and can be classified in three categories (i) single value imputation, where missing intensities are replaced by the same value for all samples; (ii) global structure methods, in which imputation is based on correlations between the whole set of observations; (iii) local similarity imputation, based only on the most similar samples.…”
Section: Introductionmentioning
confidence: 99%