2019
DOI: 10.1186/s12859-019-3110-0
|View full text |Cite
|
Sign up to set email alerts
|

Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study

Abstract: Background LC-MS technology makes it possible to measure the relative abundance of numerous molecular features of a sample in single analysis. However, especially non-targeted metabolite profiling approaches generate vast arrays of data that are prone to aberrations such as missing values. No matter the reason for the missing values in the data, coherent and complete data matrix is always a pre-requisite for accurate and reliable statistical analysis. Therefore, there is a need for proper imput… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

4
145
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 162 publications
(149 citation statements)
references
References 27 publications
4
145
0
Order By: Relevance
“…However, some studies reported that simpler methods such as mean or median replacement were as adequate as methods like kNN when imputation was followed by clustering of genetic data [26]. On the other hand, some have reported slightly better performance of random forest over kNN to impute metabolomics data [27].…”
Section: Discussionmentioning
confidence: 99%
“…However, some studies reported that simpler methods such as mean or median replacement were as adequate as methods like kNN when imputation was followed by clustering of genetic data [26]. On the other hand, some have reported slightly better performance of random forest over kNN to impute metabolomics data [27].…”
Section: Discussionmentioning
confidence: 99%
“…Missing data occur in metabolomics datasets for various reasons and managing this missingness is highly challenging [33]. Imputation is the procedure of replacing missing data with reasonable values using a priori knowledge or information available from the existing data.…”
Section: Imputation Transformation Normalization and Scalingmentioning
confidence: 99%
“…Imputation is the procedure of replacing missing data with reasonable values using a priori knowledge or information available from the existing data. In this workflow, we perform random forest (RF)-based imputation using the missForest package [33,34], although several other procedures are available [35,36]. Data distributions can affect statistical analysis, especially for variance-based models [37].…”
Section: Imputation Transformation Normalization and Scalingmentioning
confidence: 99%
“…Missing data occur in metabolomics datasets for various reasons and it is one of the most challenging computational processes in the metabolomics data preprocessing (28). Imputation is the procedure replacing the missing data with reasonable values using a priori knowledge or information available from the existing data.…”
Section: Imputation Transformation Normalization and Scalingmentioning
confidence: 99%
“…Imputation is the procedure replacing the missing data with reasonable values using a priori knowledge or information available from the existing data. In this workflow, random forest (RF) imputation is performed in order to replace the missing values with the most appropriate estimate with the missForest package (28,29), although several other procedures for imputation are available (30,31).…”
Section: Imputation Transformation Normalization and Scalingmentioning
confidence: 99%