Abstract:Data were simulated to conform to covariance patterns taken from the personnel selection literature. Two, six, and ten percent of the values were deleted from one of three predictor variables in sample sizes of 50, 100 and 200. Incomplete data matrices were treated by four methods: (a) elimination of cases with incomplete data records; (b) substitution of missing values with the variable mean; (c) replacement of missing values with an estimate obtained from simple regression; and (d) replacement of missing val… Show more
“…Cohen and Cohen (1983) suggest that missing data up to 10% on a specific variable is considered small and the variable should be retained in the analysis. Furthermore, Raymond and Roberts (1987) estimated that a variable should be retained with 40% or less missing data. Removing cases with missing data from data analysis procedures can result in reduced sample size, compromised statistical power, and inaccurate parameter estimates (Barnard & Meng, 1999;Patrician, 2002;Tabachnick & Fidell, 2001).…”
“…Cohen and Cohen (1983) suggest that missing data up to 10% on a specific variable is considered small and the variable should be retained in the analysis. Furthermore, Raymond and Roberts (1987) estimated that a variable should be retained with 40% or less missing data. Removing cases with missing data from data analysis procedures can result in reduced sample size, compromised statistical power, and inaccurate parameter estimates (Barnard & Meng, 1999;Patrician, 2002;Tabachnick & Fidell, 2001).…”
“…Because the missing data appeared to be nonsystematic and was well below missing-at-random rates (e.g., 40%; see Raymond & Roberts, 1987), we imputed variable means for individual missing data points (Paul, Mason, McCaffrey, & Fox, 2008).…”
“…All MDTs deteriorate as the 39 percentage of missingness grows and it is almost inappropriate to apply any of them Da t a S e t s S e r ie s 1 P e f or m a nc e of M e a n I mp u t a t i on when the missing percentage is greater than 50. Raymond et al [16] found that 1 when data are missing at random, MI performed better than LD. In our results, we found in two instances that LD outperformed MI.…”
In this study, we compare the performance of four different imputation strategies ranging from the commonly used Listwise Deletion to model based approaches such as the Max-
19imum Likelihood on enhancing completeness in incomplete software project data sets. We evaluate the impact of each of these methods by implementing them on six different 21 real-time software project data sets which are classified into different categories based on their inherent properties. The reliability of the constructed data sets using these 23 techniques are further tested by building prediction models using stepwise regression. The experimental results are noted and the findings are finally discussed.
25
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.