How Multiple Imputation Makes a Difference

Lall, Ranjit

doi:10.1093/pan/mpw020

Cited by 174 publications

(87 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Researchers may choose to drop observations with missing values, assign all missing values the same value (based on an assumption, for example, about why a response was not given), impute missing values using variable means, or use other imputation methods. Lall (2016) replicates a large number of empirical political science studies using multiple imputation instead of listwise deletion for missing values and finds that this changes the results for almost half of the studies. Unless there is a clear explanation for missingness that points to an assigned value or method, replication can test the robustness of the original results to alternative missing data techniques.…”

Section: Data Transformationsmentioning

confidence: 85%

Which tests not witch hunts: a diagnostic approach for conducting replication research

Brown

Wood

2018

Economics

View full text Add to dashboard Cite

Replication research can be used to explore original study results that researchers consider questionable, but it should also be a tool for reinforcing the credibility of results that are important to policies and programs. The challenge is to design a replication plan open to both supporting the original findings and uncovering potential problems. The purpose of this paper is to provide replication researchers with an objective list of checks or tests to consider when planning a replication study. The authors present tips for diagnostic replication exercises in four groups: validity of assumptions, data transformations, estimation methods, and heterogeneous impacts. For each group, the authors present an introduction to the issues, a list of replication tests and checks, some examples of how these checks are employed in replication studies of development impact evaluations, and a set of resources that provide statistical and econometric details. The authors also provide a list of don'ts for how to conduct and report replication research.

show abstract

Section: Data Transformationsmentioning

confidence: 85%

Which tests not witch hunts: a diagnostic approach for conducting replication research

Brown

Wood

2018

Economics

View full text Add to dashboard Cite

show abstract

“…This method although common in ecology, produces downward-biased standard errors since the zeros are treated as knowns rather than probabilistic estimates (Lall , 2016). This means that essentially this does not solve any problem in terms of quality statistical results.…”

Section: Methods Of Handling Missing Datamentioning

confidence: 99%

Influence of Missing Value Imputations on the Performance of Canonical Correspondence Analysis: Ecological Applications

Kakaï¹,

Lazaro²,

Gbeha³

2018

AJAS

View full text Add to dashboard Cite

Abstract. This paper assessed the influence of four imputation methods of missing values on the performance of canonical correspondence analysis (CCA). Missingness was introduced in complete multivariate normal data sets under three missing mechanisms : MCAR, MAR and NMAR. Results showed that mean imputation recorded the best performance under MCAR and MAR while for NMAR, median imputation was the best. 324Full Abstract (ENGLISH) The main objective of this study was to assess the influence of four imputation methods of missing values (mean, median, random forest and zero) on the performance of canonical correspondence analysis (CCA). Firstly, complete multivariate normal environmental data sets were simulated by taking into account sample size, number of variables, proportion of noise and correlation between variables. Thereafter, missingness in the complete data sets was artificially introduced at 0.1, 0.3 and 0.5 under three missing mechanisms: MCAR, MAR and NMAR. For each combination of factors, CCA was applied and constrained inertia was assessed between the complete data set and imputed data set. Results obtained showed that mean imputation recorded the best performance when data was MCAR and MAR. However, under NMAR, median imputation was the best preferred method. The study showed that beyond a missing value proportion of 30 % the performance of imputation methods significantly reduced.Résumé (FRENCH) L'objectif principal de cetteétude est d'évaluer l'influence de quatre méthodes d'imputation de valeurs manquantes (imputation par moyenne, médiane, forêt aléatoire et zero) sur la performance de l'analyse des correspondances canoniques (ACC). Tout d'abord, des données complètes de distribution Normale multivariée ontété générées en prenant en compte la taille deséchantillons, le nombre de variables, la proportion de bruit et la correlation entre les variables. Ensuite, des valeurs manquantes ontété artificiellement introduites dans les données environnementales (10, 30 et 50 %) suivant trois mécanismes: MCAR, MAR et NMAR. Pour chaque combinaison des facteurs, l'ACC aété appliquée et l'inertie sous contrainte des données environnementales complètes et imputées aété calculée. Les résultats obtenus montrent que l'imputation par moyenne présentait la meilleure performance dans le cas de MCAR et MAR. Toutefois, sous un NMAR, l'imputation par médianeétait la meilleure. L'étude a montré qu'à partir d'une proportion de valeurs manquantes de 30 %, la performance des méthodes d'imputation décroit significativement.

show abstract

“…Goodness-of-fit indicators are not equivalent to the probability of a given model being true (Anscombe 1973;King 1986), and the weights constructed this way are not invariant to transformations in the dependent variable. Moreover, our data set has a number of missing observations, so model comparison measures could be misleading (Lall 2016). …”

Section: Extreme Bounds Analysismentioning

confidence: 99%

“…Secondly, in DRF, non-observed cases are not assumed to be missing at random, but rather as values that contain information in themselves. The algorithm assumes that observations are missing for a reason, what is most likely the case with social science data (Lall 2016). This is a more conservative approach than assuming that missing cases fit into an underlying parametric distribution.…”

Section: Random Forestsmentioning

confidence: 99%

What Drives State-Sponsored Violence?: Evidence from Extreme Bounds Analysis and Ensemble Learning Models

Freire¹,

Uzonyi²

2018

Preprint

View full text Add to dashboard Cite

show abstract

How Multiple Imputation Makes a Difference

Cited by 174 publications

References 48 publications

Which tests not witch hunts: a diagnostic approach for conducting replication research

Which tests not witch hunts: a diagnostic approach for conducting replication research

Influence of Missing Value Imputations on the Performance of Canonical Correspondence Analysis: Ecological Applications

What Drives State-Sponsored Violence?: Evidence from Extreme Bounds Analysis and Ensemble Learning Models

Contact Info

Product

Resources

About