Multiple imputation of missing values is a key step in data analytics and a standard process in data mining. Non-linear imputation methods ones comes into play whenever the linear relationship between a response and predictors cannot be linearized. One kind of popular non-linear methods are Generalized Additive Models (GAM) and an extension of GAM, namely GAMLSS, where each parameter of the distribution (e.g., mean, variance, skewness, kurtosis) can be modeled as a function of predictors.
However, non-robust methods such as standard GAM's and GAMLSS's can be swayed by outliers, leading to outlier-driven imputations. This can apply concerning both representative outliers - those true yet unusual values of your population - and non-representative outliers, which are mere measurement errors.
Robust (imputation) methods effectively manage outliers and exhibit remarkable resistance to their influence, providing a more reliable approach to dealing with missing data.
A new robust imputation algorithm is introduced. This innovative solution addresses three significant challenges with robustness. (1) It uses a robust bootstrap to manage model uncertainty when imputing a random sample, (2) it incorporates robust fitting to reinforce accuracy, and (3) it takes into account imputation uncertainty in a resilient manner. Furthermore, any complex model for any variable with missingness can be considered and run through the algorithm.
For the employed real-world datasets and the conducted simulation study, the novel algorithm imputeRobust demonstrates superior performance in comparison to other prevalent methods.