Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering 2013
DOI: 10.1145/2460999.2461024
|View full text |Cite
|
Sign up to set email alerts
|

Data quality in empirical software engineering

Abstract: Context

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 20 publications
0
10
0
Order By: Relevance
“…c 2015 ACM 1049-331X/2015/05-ART20 $15.00 DOI: http://dx.doi.org/10.1145/2738037 compare the performance of the resulting models with other methods, the results are highly variable [Myrtveit and Stensrud 2012;Shepperd and Kadoda 2001], sometimes contradictory [Menzies and Shepperd 2012], and often difficult to interpret and validate [Kitchenham and Mendes 2009]. This is due in part to a number of factors: the use of datasets that are difficult to obtain or are proprietary and therefore not publically available [Mair et al 2005]; bias in dataset selection [Kitchenham and Mendes 2009]; initial data preparation methods that remove outlier examples and/or explanatory variables, making comparison with previous work difficult [ Bosu and MacDonell 2013]; the sampling methods used when assessing model quality; the presented error measurements [Myrtveit and Stensrud 2012;Myrtveit et al 2005]; bias in modeller expertise and parameter tuning [Song et al 2013]; and a lack of meaningful comparative models and statistics. Although these issues arise in many fields where inductive model building is performed, and have been recognised as relevant to software effort estimation, they appear both common and enduring in this domain.…”
Section: Introductionmentioning
confidence: 99%
“…c 2015 ACM 1049-331X/2015/05-ART20 $15.00 DOI: http://dx.doi.org/10.1145/2738037 compare the performance of the resulting models with other methods, the results are highly variable [Myrtveit and Stensrud 2012;Shepperd and Kadoda 2001], sometimes contradictory [Menzies and Shepperd 2012], and often difficult to interpret and validate [Kitchenham and Mendes 2009]. This is due in part to a number of factors: the use of datasets that are difficult to obtain or are proprietary and therefore not publically available [Mair et al 2005]; bias in dataset selection [Kitchenham and Mendes 2009]; initial data preparation methods that remove outlier examples and/or explanatory variables, making comparison with previous work difficult [ Bosu and MacDonell 2013]; the sampling methods used when assessing model quality; the presented error measurements [Myrtveit and Stensrud 2012;Myrtveit et al 2005]; bias in modeller expertise and parameter tuning [Song et al 2013]; and a lack of meaningful comparative models and statistics. Although these issues arise in many fields where inductive model building is performed, and have been recognised as relevant to software effort estimation, they appear both common and enduring in this domain.…”
Section: Introductionmentioning
confidence: 99%
“…While empirical software engineering researchers have paid attention to data quality, systematic reviews of data quality by Liebchen et al [14] [15] [20] and Bosu [1] [2] indicated that few researchers have explored data quality, and more research needs to be done in this area.…”
Section: Data Qualitymentioning
confidence: 99%
“…The importance of the quality of data used by empirical studies has been acknowledged and assessed in the last years [13,14,15,16,17,18], [30,31], mostly due to the impact that it may have on the decisions taken. Some papers explicitly emphasize the importance of DQ in empirical software engineering datasets, as data imperfections can have unwanted impact on the data analysis and might lead to false conclusions [14], [16], [25].…”
Section: Related Workmentioning
confidence: 99%
“…Three literature reviews were carried out in this particular topic, showing interest and concern about how researchers are dealing with DQ problems [13,14], [30]. They all conclude that empirical software engineering community should pay more attention to this issue, which has long been neglected according to the results…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation