2014 IEEE International Conference on Software Maintenance and Evolution 2014
DOI: 10.1109/icsme.2014.53
|View full text |Cite
|
Sign up to set email alerts
|

Combining Text Mining and Data Mining for Bug Report Classification

Abstract: Misclassification of bug reports inevitably sacrifices the performance of bug prediction models. Manual examinations can help reduce the noise but bring a heavy burden for developers instead. In this paper, we propose a hybrid approach by combining both text mining and data mining techniques of bug report data to automate the prediction process. The first stage leverages text mining techniques to analyze the summary parts of bug reports and classifies them into three levels of probability. The extracted featur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
71
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 78 publications
(72 citation statements)
references
References 40 publications
0
71
0
1
Order By: Relevance
“…Specifically, we experimented (relying on the Weka tool) different machine learning techniques, namely, the standard probabilistic naive Bayes classifier, Logistic Regression, Support Vector Machines, J48, and the alternating decision tree (ADTree). The choice of these techniques is not random since they were successfully used for bug reports classification [1], [38] and for defect prediction in many previous works [3], [5], [8], [29], [39], thus allowing to increase the generalisability of our findings. To answer RQ1 we experimented the ML techniques described above performing a training on the NLP, TA, and SA features.…”
Section: F Learning Classifiersmentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, we experimented (relying on the Weka tool) different machine learning techniques, namely, the standard probabilistic naive Bayes classifier, Logistic Regression, Support Vector Machines, J48, and the alternating decision tree (ADTree). The choice of these techniques is not random since they were successfully used for bug reports classification [1], [38] and for defect prediction in many previous works [3], [5], [8], [29], [39], thus allowing to increase the generalisability of our findings. To answer RQ1 we experimented the ML techniques described above performing a training on the NLP, TA, and SA features.…”
Section: F Learning Classifiersmentioning
confidence: 99%
“…Previous works addressed the problem of bugs misclassification in issue trackers [1], [26] building ML classifiers which relying on textual features in bug reports try to classify (or reclassify) the issues. Several works focused on the API documentation trying to: (i) categorize source code and textual descriptions in API discussion forums [38], (ii) detect knowledge items in API reference documentation [11], or (iii) infer formal method specifications from API documents [32]. Sharama et al [35] proposed a new approach based on a language model that can help developers in identifying software related tweets.…”
Section: Related Workmentioning
confidence: 99%
“…Text mining has many important applications, including analysis of stock reports, product manuals, business and normative documents [32]. It is also proved to be useful for bug report analysis, including bug report classification [4,39], detection of duplicates [30], and prediction of certain properties of software flaws [13].…”
Section: Text Mining and The Vector Space Modelmentioning
confidence: 99%
“…Zanetti et al [22] proposed a method of classifying valid bug reports based on nine measures quantifying the social embeddedness of bug reporters in the collaboration network. Zhou et al [7] proposed a hybrid approach of combining text mining and data mining techniques of bug report data to identify corrective bug reports. This way could reduce the noise of misclassification (i.e., filtering bug reports that are not corrective) and support better performance of bug prediction.…”
Section: Automatic Bug Classification In Software Engineeringmentioning
confidence: 99%
“…Several previous studies have been conducted to investigate the classification of issue reports for open-source projects using supervised machine learning algorithms [4][5][6][7]. Feng et al [8,9] proposed test report prioritization methods for use in crowdsourced testing.…”
Section: Introductionmentioning
confidence: 99%