Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels' relevancy to potential contributors, leveraged the issues' descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and
[Background] Selecting an appropriate task is challenging for contributors to Open Source Software (OSS), mainly for those who are contributing for the first time. Therefore, researchers and OSS projects have proposed various strategies to aid newcomers, including labeling tasks. [Aims] In this research, we investigate the automatic labeling of open issues strategy to help the contributors to pick a task to contribute. We label the issues with APIdomains- categories of APIs parsed from the source code used to solve the issues. We plan to add social network analysis metrics gathered from the issues conversations as new predictors. By identifying the skills, we claim the contributor candidates should pick a task more suitable to their skill. [Method] We are employing mixed methods. We qualitatively analyzed interview transcripts and the survey's open-ended questions to comprehend the strategies communities use to assist in onboarding contributors and contributors used to pick up an issue. We applied quantitative studies to analyze the relevance of the API-domain labels in a user experiment and compare the strategies' relative importance for diverse contributor roles. We also mined project and issue data from OSS repositories to build the ground truth and predictors able to infer the API-domain labels with comparable precision, recall, and F-measure with the state-of-art. We also plan to use a skill ontology to assist the matching process between contributors and tasks. By quantitatively analyzing the confidence level of the matching instances in ontologies describing contributors' skills and tasks, we might recommend issues for contribution. In addition, we will measure the effectiveness of the API-domain labels by evaluating the issues solving time and the rate among the labeled and unlabelled ones. [Results] So far, the results showed that organizing the issues?which includes assigning labels is seen as an essential strategy for diverse roles in OSS communities. The API-domain labels are relevant, mainly for experienced practitioners. The predicted labels have an average precision of 75.5%. [Conclusions] Labeling the issues with the API-domain labels indicates the skills involved in an issue. The labels represent possible libraries (aggregated into domains) used in the source code related to an issue. By investigating this research topic, we expect to assist the new contributors in finding a task, helping OSS communities to attract and retain more contributors.
Developers often struggle to navigate an Open Source Software (OSS) project's issue-tracking system and find a suitable task. Proper issue labeling can aid task selection, but current tools are limited to classifying the issues according to their type (e.g., bug, question, good first issue, feature, etc.). In contrast, this paper presents a tool (GiveMeLabeledIssues) that mines project repositories and labels issues based on the skills required to solve them. We leverage the domain of the APIs involved in the solution (e.g., User Interface (UI), Test, Databases (DB), etc.) as a proxy for the required skills. GiveMeLabeledIssues facilitates matching developers' skills to tasks, reducing the burden on project maintainers. The tool obtained a precision of 83.9% when predicting the API domains involved in the issues. The replication package contains instructions on executing the tool and including new projects. A demo video is available at https://www.youtube.com/watch?v=ic2quUue7i8
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.