2016
DOI: 10.7287/peerj.preprints.2617v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Curating GitHub for engineered software projects

Abstract: Software forges like GitHub host millions of repositories. Software engineering researchers have been able to take advantage of such a large corpora of potential study subjects with the help of tools like GHTorrent and Boa. However, the simplicity in querying comes with a caveat: there are limited means of separating the signal (e.g. repositories containing engineered software projects) from the noise (e.g. repositories containing home work assignments). The proportion of noise in a random sample of repositori… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
30
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 25 publications
(30 citation statements)
references
References 18 publications
0
30
0
Order By: Relevance
“…The research focus of Munaiah et al (2017) was different from the one of this paper, since they developed a set of metrics and curated software repositories from GitHub according to these metrics, which is a method of algorithmic curating [13]. In contrast, in this study, we investigate the curation behavior happening on GitHub that appropriates GitHub repositories for curation purposes, which is a different category of curation in the software practice.…”
Section: Curation In Githubmentioning
confidence: 91%
See 1 more Smart Citation
“…The research focus of Munaiah et al (2017) was different from the one of this paper, since they developed a set of metrics and curated software repositories from GitHub according to these metrics, which is a method of algorithmic curating [13]. In contrast, in this study, we investigate the curation behavior happening on GitHub that appropriates GitHub repositories for curation purposes, which is a different category of curation in the software practice.…”
Section: Curation In Githubmentioning
confidence: 91%
“…A recent study by Munaiah et al (2017) developed a tool, called reaper, that evaluates a GitHub repository from eight dimensions to determine whether a repository is an engineered software project and to identify software projects that conform to the dimensions within a sample of 1,994,977 GitHub repositories [13]. The research focus of Munaiah et al (2017) was different from the one of this paper, since they developed a set of metrics and curated software repositories from GitHub according to these metrics, which is a method of algorithmic curating [13].…”
Section: Curation In Githubmentioning
confidence: 99%
“…To collect PR data from GitHub, we first used the RepoReapers framework [21] to select engineered software projects. We obtained all 95,804 Java repositories that had been classified as containing engineered software projects by RepoReapers's Random Forest classification and retrieved the number of merged PRs for each repository.…”
Section: A Data Collectionmentioning
confidence: 99%
“…In order to filter the large dataset provided by GHTorrent (about 37 million projects), we followed criteria laid out by Vasilescu et al [7], Tsay et al [16], and Munaiah et al [17]. The combination of the criteria from the previously mentioned literature resulted in the following filters:…”
Section: A Project Selection Criteriamentioning
confidence: 99%
“…We focused on projects where a contributorparticularly one who has no write privileges to the source repository-has access to the build results. • Exclude projects that have less than three unique contributors: This is an indicator of the project having a tightly-knit community of developers that are actively collaborating but are less inclined to accept external contribution, as discussed by Munaiah et al [17]. • Exclude projects that do not have at least one recently merged pull request: According to Kalliamvakou et al [18], having a pull request does not indicate that it was merged.…”
Section: A Project Selection Criteriamentioning
confidence: 99%