2022
DOI: 10.1007/s10664-021-10092-4
|View full text |Cite
|
Sign up to set email alerts
|

Problems with SZZ and features: An empirical study of the state of practice of defect prediction data collection

Abstract: Context The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features are also important. Objective We provide an empirical analysis of the defect labels created with the S… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
47
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

5
2

Authors

Journals

citations
Cited by 45 publications
(47 citation statements)
references
References 85 publications
0
47
0
Order By: Relevance
“…Thus, we can not only compare pre-trained models to each other, but also their benefit over simpler models that can be trained in minutes without any special hardware. Second, the task is hard and often not solved correctly by humans, unless careful manual validation is used [43], [44], [45].…”
Section: Fine-tuned Prediction Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, we can not only compare pre-trained models to each other, but also their benefit over simpler models that can be trained in minutes without any special hardware. Second, the task is hard and often not solved correctly by humans, unless careful manual validation is used [43], [44], [45].…”
Section: Fine-tuned Prediction Tasksmentioning
confidence: 99%
“…We suggest to use the five projects from Herzig et al [44] and the 38 projects from Herbold et al [45] together and conduct a leave-one-project-out cross validation experiment. This means that one project is used for testing and the remaining 42 projects for the training of the classifier, i.e., of the a fastText model as baseline and for the fine-tuning of the BERT models.…”
Section: Fine-tuned Prediction Tasksmentioning
confidence: 99%
“…Our study subjects consist of 23 Java projects under the umbrella of the Apache Software Foundation 3 previously collected by (Herbold et al, 2020). Table 1 contains the list of our study subjects.…”
Section: Study Subjectsmentioning
confidence: 99%
“…The ITS usually has a kind of label or type to distinguish bugs from other issues, e.g., feature requests. However, research shows that this label is often incorrect, e.g., Antoniol et al (2008), Herzig et al (2013b) and Herbold et al (2020). Moreover, with this kind of software evolution research, we are interested in bugs existing in the software and not bugs which occur because of external factors, e.g., new environments or dependency upgrades.…”
Section: Study Subjectsmentioning
confidence: 99%
See 1 more Smart Citation