2018
DOI: 10.18293/seke2018-085
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub

Abstract:  -Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to mining these datasets have also grown. As the dataset grows, it becomes gradually unrealistic for human to confirm quality of all samples. Some studies have investigated this problem and provided solutions to avoid threats in sample selection, but some of these solutions (… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 23 publications
0
14
0
Order By: Relevance
“…It uses greedy strategy to generate decision trees. We selected this method because it has been tested to be effective in selecting PDPs [4]. Logistic Regression (LR) .…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…It uses greedy strategy to generate decision trees. We selected this method because it has been tested to be effective in selecting PDPs [4]. Logistic Regression (LR) .…”
Section: Methodsmentioning
confidence: 99%
“…Peril 4 still needs to be solved by researchers manually. In addition, peril 5 cannot be effectively solved by the corresponding strategy because in our previous work [4] we tested this strategy and found that this strategy cannot select PDPs with a high recall, which means that if researchers use the committer number to select project samples, they will miss many PDPs. Hence, if researchers do not want to select projects that are personal or projects that are not built for development, they have to spend considerable human effort to select samples manually.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations