2021
DOI: 10.1007/s10664-020-09905-9
|View full text |Cite
|
Sign up to set email alerts
|

World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(12 citation statements)
references
References 57 publications
0
12
0
Order By: Relevance
“…World of Code [14] is a large dataset and analysis infrastructure, available to researchers to mine public code. It is larger than our initial data source and can be used in conjunction with this dataset to find additional origins/occurrences of licenses blobs of interest.…”
Section: Related Workmentioning
confidence: 99%
“…World of Code [14] is a large dataset and analysis infrastructure, available to researchers to mine public code. It is larger than our initial data source and can be used in conjunction with this dataset to find additional origins/occurrences of licenses blobs of interest.…”
Section: Related Workmentioning
confidence: 99%
“…Other tools offering an infrastructure framework include Crossflow 4 [7], SmartSHARK 5 [15], and World of Code [9,10]. CrossFlow is a domain specific language and framework that offers scalability for MSR tasks; SmartSHARK aggregates data from different sources in a harmonized schema.…”
Section: Related Workmentioning
confidence: 99%
“…One goal is to quantify and monitor the quality of the developed software employing software metrics [8], another is to explore new ways of making sense of the data. However, traversing and gathering the vast amount of data poses a challenge [4,[9][10][11].…”
Section: Introductionmentioning
confidence: 99%
“…The challenge with using this dataset for training purposes is that it contains only 1471 vulnerable functions and 1320 vulnerable files, which is rather small for training a deep learning model. Software Repositories and Repository Mining Frameworks Ma et al [30] present World of Code (WoC), a large and frequently updated collection of git-based version control data. The data is indexed three storage abstractions and arranged in four layers.…”
Section: Related Workmentioning
confidence: 99%