2016
DOI: 10.7287/peerj.preprints.2617
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Curating GitHub for engineered software projects

Abstract: 12Software forges like GitHub host millions of repositories. Software engineering researchers have been able to take advantage of such a large corpora of potential study subjects with the help of tools like GHTorrent and Boa. However, the simplicity in querying comes with a caveat: there are limited means of separating the signal (e.g. repositories containing engineered software projects) from the noise (e.g. repositories containing home work assignments). The proportion of noise in a random sample of reposito… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
36
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(36 citation statements)
references
References 3 publications
0
36
0
Order By: Relevance
“…Secondly, the criteria used to filter out inactive, or noncollaborative projects may introduce bias; Work by Munaiah et al [55] has shown that the recall of such filtering techniques can be 30% or lower. Therefore, there could be a class of smaller yet active and collaborative open-source software projects which we are not represented in our study.…”
Section: B External Validitymentioning
confidence: 99%
“…Secondly, the criteria used to filter out inactive, or noncollaborative projects may introduce bias; Work by Munaiah et al [55] has shown that the recall of such filtering techniques can be 30% or lower. Therefore, there could be a class of smaller yet active and collaborative open-source software projects which we are not represented in our study.…”
Section: B External Validitymentioning
confidence: 99%
“…Finally, we have manually inspected the selected projects and made sure that all of them are real projects (rather than student projects, assignments, etc. ), as suggested by recent work (Munaiah et al 2017). The specific query employed for the selection of the subject projects was done on April 2017 and can be found in our on-line appendix ).…”
Section: Methodsmentioning
confidence: 99%
“…WoC can be extended with other approaches to segment projects 17 . For example, identification of projects with sound software engineering practices [58] relies on a combination of factors easily obtainable in WoC, such as history, license, and unit tests.…”
Section: Repository Filtering Toolmentioning
confidence: 99%