Combining Text Mining and Data Mining for Bug Report Classification

Zhou, Yu; Tong, Yanxiang; Gu, Ruihang; Gall, Harald C.

doi:10.1109/icsme.2014.53

Cited by 78 publications

(72 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Specifically, we experimented (relying on the Weka tool) different machine learning techniques, namely, the standard probabilistic naive Bayes classifier, Logistic Regression, Support Vector Machines, J48, and the alternating decision tree (ADTree). The choice of these techniques is not random since they were successfully used for bug reports classification [1], [38] and for defect prediction in many previous works [3], [5], [8], [29], [39], thus allowing to increase the generalisability of our findings. To answer RQ1 we experimented the ML techniques described above performing a training on the NLP, TA, and SA features.…”

Section: F Learning Classifiersmentioning

confidence: 99%

“…Previous works addressed the problem of bugs misclassification in issue trackers [1], [26] building ML classifiers which relying on textual features in bug reports try to classify (or reclassify) the issues. Several works focused on the API documentation trying to: (i) categorize source code and textual descriptions in API discussion forums [38], (ii) detect knowledge items in API reference documentation [11], or (iii) infer formal method specifications from API documents [32]. Sharama et al [35] proposed a new approach based on a language model that can help developers in identifying software related tweets.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

How can i improve my app? Classifying user reviews for software maintenance and evolution

Panichella

Sorbo

Guzmán

et al. 2015

2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)

370

288

View full text Add to dashboard Cite

App Stores, such as Google Play or the Apple Store, allow users to provide feedback on apps by posting review comments and giving star ratings. These platforms constitute a useful electronic mean in which application developers and users can productively exchange information about apps. Previous research showed that users feedback contains usage scenarios, bug reports and feature requests, that can help app developers to accomplish software maintenance and evolution tasks. However, in the case of the most popular apps, the large amount of received feedback, its unstructured nature and varying quality can make the identification of useful user feedback a very challenging task. In this paper we present a taxonomy to classify app reviews into categories relevant to software maintenance and evolution, as well as an approach that merges three techniques: (1) Natural Language Processing, (2) Text Analysis and (3) Sentiment Analysis to automatically classify app reviews into the proposed categories. We show that the combined use of these techniques allows to achieve better results (a precision of 75% and a recall of 74%) than results obtained using each technique individually (precision of 70% and a recall of 67%).Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-113425 Accepted Version Originally published at: Panichella, Sebastiano; Di Sorbo, Andrea; Guzman, Emitza; Visaggio, Corrado Aaron; Canfora, Gerardo; Gall, Harald (2015). How can I improve my app? Classifying user reviews for software maintenance and evolution. In: ICSME 2015. IEEE International Conference on Software Maintenance and Evolution, Bremen, 29 September 2015 -1 October 2015.How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution S. Panichella * , A. Di Sorbo † , E. Guzman ‡ , C. A.Visaggio † , G. Canfora † and H. C. Gall * * University of Zurich, Switzerland † University of Sannio, Benevento, Italy ‡ Technische Universität München, Garching, Germany panichella@ifi.uzh.ch, disorbo@unisannio.it, emitza.guzman@mytum.de, {visaggio,canfora}@unisannio.it, gall@ifi.uzh.ch Abstract-App Stores, such as Google Play or the Apple Store, allow users to provide feedback on apps by posting review comments and giving star ratings. These platforms constitute a useful electronic mean in which application developers and users can productively exchange information about apps. Previous research showed that users feedback contains usage scenarios, bug reports and feature requests, that can help app developers to accomplish software maintenance and evolution tasks. However, in the case of the most popular apps, the large amount of received feedback, its unstructured nature and varying quality can make the identification of useful user feedback a very challenging task. In this paper we present a taxonomy to classify app reviews into categories relevant to software maintenance and evolution, as well as an approach that merges three techniques: (1) Natural Language Process...

show abstract

Section: F Learning Classifiersmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

How can i improve my app? Classifying user reviews for software maintenance and evolution

Panichella

Sorbo

Guzmán

et al. 2015

2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)

370

288

View full text Add to dashboard Cite

show abstract

“…Text mining has many important applications, including analysis of stock reports, product manuals, business and normative documents [32]. It is also proved to be useful for bug report analysis, including bug report classification [4,39], detection of duplicates [30], and prediction of certain properties of software flaws [13].…”

Section: Text Mining and The Vector Space Modelmentioning

confidence: 99%

Automated Dataset Construction from Web Resources with Tool Kayur

Kohan

Yamamoto

Artho

2017

IJNC

View full text Add to dashboard Cite

Many text mining tools cannot be applied directly to documents available on web pages. There are tools for fetching and preprocessing of textual data, but combining them with the data processing tool into one working tool chain can be time consuming. The preprocessing task is even more labor-intensive if documents are located on multiple remote sources with different storage formats.In this paper, we propose the simplification of data preparation process for cases when data come from wide range of web resources. We developed an open-source tool, called Kayur, that greatly minimizes time and effort required for routine data preprocessing steps, allowing to quickly proceed to the main task of data analysis. The datasets generated by the tool are ready to be loaded into a data mining workbench, such as WEKA or Carrot2, to perform classification, feature prediction, and other data mining tasks.

show abstract

“…Zanetti et al [22] proposed a method of classifying valid bug reports based on nine measures quantifying the social embeddedness of bug reporters in the collaboration network. Zhou et al [7] proposed a hybrid approach of combining text mining and data mining techniques of bug report data to identify corrective bug reports. This way could reduce the noise of misclassification (i.e., filtering bug reports that are not corrective) and support better performance of bug prediction.…”

Section: Automatic Bug Classification In Software Engineeringmentioning

confidence: 99%

“…Several previous studies have been conducted to investigate the classification of issue reports for open-source projects using supervised machine learning algorithms [4][5][6][7]. Feng et al [8,9] proposed test report prioritization methods for use in crowdsourced testing.…”

Section: Introductionmentioning

confidence: 99%

Using Knowledge Transfer and Rough Set to Predict the Severity of Android Test Reports via Text Mining

Guo

Chen

2017

Symmetry

View full text Add to dashboard Cite

Abstract:Crowdsourcing is an appealing and economic solution to software application testing because of its ability to reach a large international audience. Meanwhile, crowdsourced testing could have brought a lot of bug reports. Thus, in crowdsourced software testing, the inspection of a large number of test reports is an enormous but essential software maintenance task. Therefore, automatic prediction of the severity of crowdsourced test reports is important because of their high numbers and large proportion of noise. Most existing approaches to this problem utilize supervised machine learning techniques, which often require users to manually label a large number of training data. However, Android test reports are not labeled with their severity level, and manual labeling is time-consuming and labor-intensive. To address the above problems, we propose a Knowledge Transfer Classification (KTC) approach based on text mining and machine learning methods to predict the severity of test reports. Our approach obtains training data from bug repositories and uses knowledge transfer to predict the severity of Android test reports. In addition, our approach uses an Importance Degree Reduction (IDR) strategy based on rough set to extract characteristic keywords to obtain more accurate reduction results. The results of several experiments indicate that our approach is beneficial for predicting the severity of android test reports.

show abstract

Combining Text Mining and Data Mining for Bug Report Classification

Cited by 78 publications

References 40 publications

How can i improve my app? Classifying user reviews for software maintenance and evolution

How can i improve my app? Classifying user reviews for software maintenance and evolution

Automated Dataset Construction from Web Resources with Tool Kayur

Using Knowledge Transfer and Rough Set to Predict the Severity of Android Test Reports via Text Mining

Contact Info

Product

Resources

About