“Severity” is one of the essential features of software bug reports, which is a crucial factor for developers to decide which bug should be fixed immediately and which bug could be delayed to a next release. Severity assignment is a manual process and its accuracy depends on the experience of the assignee. Prior research proposed several models to automate this process. These models are based on textual preprocessing of historical bug reports and classification techniques. Although bug repositories suffer from severity class imbalance, none of the prior studies investigated the impact of implementing a class rebalancing technique on the accuracy of their models. In this paper, we propose a framework for predicting fine-grained severity levels which utilizes an over-sampling technique “SMOTE”, to balance the severity classes, and a feature selection scheme, to reduce the data scale and select the most informative features for training a [Formula: see text]-nearest neighbor (KNN) classifier. The KNN classifier utilizes a distance-weighted voting scheme to predict the proper severity level of a newly reported bug. We investigated the effectiveness of our proposed approach on two large bug repositories, namely Eclipse and Mozilla, and the experimental results showed that our approach outperforms cutting-edge studies in predicting the minority severity classes.
When bug reports are submitted through bug tracking systems, they are analysed manually to identify their severity levels. A severity level specifies the negative impact of a bug on a system. With the huge number of submitted reports, setting the severity class manually is tedious and time consuming. Moreover, some bug types are reported more often than other types, which leads to imbalanced bug repositories. This paper proposes a multi-feature approach for automatic severity assignment, which leverages lexical, semantic, and categorical properties of the bug reports. The proposed approach utilizes word embeddings, topic model, vector space model, and an adapted K-Nearest Neighbour technique. Moreover, the impact of utilizing two sampling techniques, namely SMOTE and cluster-based under-sampling (CBU), were investigated. Experiments over two open source repositories, Eclipse and Mozilla, demonstrated that the proposed approach is superior to two previous studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.