Bug assignment is the task of ranking candidate developers in terms of their potential competence to fix a bug report. Numerous methods have been developed to address this task, relying on different methodological assumptions and demonstrating their effectiveness with a variety of empirical studies with numerous data sets and evaluation criteria. Despite the importance of the subject and the attention it has received from researchers, there is still no unanimity on how to validate and comparatively evaluate bug‐assignment methods and, often times, methods reported in the literature are not reproducible.
In this paper, we first report on our systematic review of the broad bug‐assignment research field. Next, we focus on a few key empirical studies and review their choices with respect to three important experimental‐design parameters, namely, the evaluation metric(s) they report, their definition of who the real assignee is, and the community of developers they consider as candidate assignees.
The substantial variability on these criteria led us to formulate a systematic experiment to explore the impact of these choices. We conducted our experiment on a comprehensive data set of bugs we collected from 13 long‐term open‐source projects, using a simple Tf‐IDf similarity metric. On the basis of our arguments and/or experiments, we provide useful guidelines for performing further bug‐assignment research. We conclude that mean average precision (MAP) is the most informative evaluation metric, the developer community should be defined as “all the project members,” and the real assignee should be defined as “any developer who worked toward fixing a bug.”