2007
DOI: 10.1142/s0218194007003173
|View full text |Cite
|
Sign up to set email alerts
|

Performance Evaluation of Imputation Methods for Incomplete Datasets

Abstract: In this study, we compare the performance of four different imputation strategies ranging from the commonly used Listwise Deletion to model based approaches such as the Max- 19imum Likelihood on enhancing completeness in incomplete software project data sets. We evaluate the impact of each of these methods by implementing them on six different 21 real-time software project data sets which are classified into different categories based on their inherent properties. The reliability of the constructed data sets u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2010
2010
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…Several studies showed that NN may be superior over other hot-deck methods even though results may be dependent from the choice of the metric used to gauge the similarity or the dissimilarity of recipients to donors [7]. …”
Section: Introductionmentioning
confidence: 99%
“…Several studies showed that NN may be superior over other hot-deck methods even though results may be dependent from the choice of the metric used to gauge the similarity or the dissimilarity of recipients to donors [7]. …”
Section: Introductionmentioning
confidence: 99%
“…1) results may be dependent from the choice of the metric used to gauge the similarity (or the distance) between observations [45]. 2) due to the knn search, when the dataset has many samples, the algorithm might be time consuming.…”
Section: ) Missing Data Imputation Methodsmentioning
confidence: 99%
“…In the calculation of ( ), Ŷ and 2 denote the pilot estimates of fitted values and the variance of the error terms for chosen a pilot smoothing parameter ( ), respectively. In practice, since the variance 2 is generally unknown, 2 is used and it can be easily calculated by the equation (11). Note also that variance of error terms ( 2 ) is called as a variance of the regression model.…”
Section: Selection Of the Smoothing Parametermentioning
confidence: 99%
“…In the literature, some examples of imputation include Schafer, Batista and Monard, Rubin and van der Laan, Yenduri and Iyengar and Andridge and Little. [8][9][10][11][12] There are also various imputation methods used for different types of data such as fuzzy K-means, singular value decomposition, multiple imputations by chained equations. 13,14 Most of these methods are developed to solve missing data problems but some of these methods are also suitable for solving the right-censorship problems.…”
mentioning
confidence: 99%