An Issue Tracking System (ITS) plays crucial roles in software development and provides valuable information to understand issue management. In an ITS, software developers often discuss issues reported during software development. Recent studies analyzed such issue discussions and identified information types of issue comments that appeared in the discussions. Automatic classification of the information types can help developers to more easily understand and locate required information, but existing techniques cannot provide accurate classification. In this study, we propose a more accurate technique to classify information types of issue comments. The key to increasing classification performance is employing random oversampling, to deal with imbalances among training instances of different information types. With random oversampling, we trained a classifier using logistic regression with hyperparameter tuning and achieved an average 0.95 F1-score, which was much higher than 0.53 of the compared existing technique. We also considered other two key aspects of the technique to fully investigate the potential performance improvement. We expanded an existing issue comment dataset by adding 4,098 more instances, which almost doubles the size of the dataset. Also, we analyzed the influence of hyperparameters on classification performance and found that using values within an appropriate range is important to achieve high performance.
INDEX TERMSissue discussion analysis, issue management, issue tracking systems, open source software, random oversampling I. INTRODUCTION IN software development, an Issue Track-ing System (ITS) plays an important role in issue management, and any issues during the development can be reported, tracked, and discussed in the system. Because of its crucial roles, many software engineering studies and techniques have targeted ITSs. Some of the stud-