Missing values are a common problem in many real world databases. Inadequate handing of missing data can lead to serious problems in data analysis. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. This paper proposes GPMI, a multiple imputation method that uses genetic programming as a regression method to estimate missing values. Experiments on eight datasets with six levels of missing values compare GPMI with seven other popular and advanced imputation methods on two measures: the prediction accuracy and the classification accuracy. The results show that, in most cases, GPMI not only achieves better prediction accuracy, but also better classification accuracy than the other imputation methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.