2023
DOI: 10.1371/journal.pcbi.1010154
|View full text |Cite
|
Sign up to set email alerts
|

A real data-driven simulation strategy to select an imputation method for mixed-type trait data

Abstract: Missing observations in trait datasets pose an obstacle for analyses in myriad biological disciplines. Considering the mixed results of imputation, the wide variety of available methods, and the varied structure of real trait datasets, a framework for selecting a suitable imputation method is advantageous. We invoked a real data-driven simulation strategy to select an imputation method for a given mixed-type (categorical, count, continuous) target dataset. Candidate methods included mean/mode imputation, k-nea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 76 publications
0
3
0
Order By: Relevance
“…Therefore, while our results agree with others that random forest models (as implemented by the missForest R function) are an accurate imputation method for trait data (Johnson et al, 2021), care should be taken to ensure use of imputation is appropriate. Our findings regarding the utility of imputation are only applicable to continuous trait imputation, as the efficacy of categorical traits imputation was not explored (although see May et al, 2023), and to large trait data sets on the scale of hundreds or thousands of species rather than tens. The utility of imputation in tackling missing and biased data has been shown to depend on the correlation between traits, and extent of phylogenetic autocorrelation (Clavel et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, while our results agree with others that random forest models (as implemented by the missForest R function) are an accurate imputation method for trait data (Johnson et al, 2021), care should be taken to ensure use of imputation is appropriate. Our findings regarding the utility of imputation are only applicable to continuous trait imputation, as the efficacy of categorical traits imputation was not explored (although see May et al, 2023), and to large trait data sets on the scale of hundreds or thousands of species rather than tens. The utility of imputation in tackling missing and biased data has been shown to depend on the correlation between traits, and extent of phylogenetic autocorrelation (Clavel et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…We tested two ways of dealing with the generated incomplete data sets: (1) removal of species with missing data (complete case analysis) and (2) filling data gaps through imputation. We used missForest imputation, implemented through the missForest (Stekhoven & Bühlmann, 2012), due to its demonstrated accuracy (Hong & Lynn, 2020;May et al, 2023;Penone et al, 2014), and fast computation times.…”
Section: Imputationmentioning
confidence: 99%
See 1 more Smart Citation