2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) 2017
DOI: 10.1109/msr.2017.48
|View full text |Cite
|
Sign up to set email alerts
|

An Extensive Dataset of UML Models in GitHub

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
40
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(40 citation statements)
references
References 6 publications
0
40
0
Order By: Relevance
“…A recent and extensive study of models stored in various formats in GitHub, combining automatic processing and a lot of manual work, identified 93,596 UML models from 24,717 different repositories, of which 57,822 (61.8%) are images, the rest being files with extensions .xmi or .uml [17]. This confirms the existence of potentially useful and interesting information in repositories that is still difficult to access and reuse; and it also confirms that a large proportion of models is stored as images.…”
Section: Related Workmentioning
confidence: 99%
“…A recent and extensive study of models stored in various formats in GitHub, combining automatic processing and a lot of manual work, identified 93,596 UML models from 24,717 different repositories, of which 57,822 (61.8%) are images, the rest being files with extensions .xmi or .uml [17]. This confirms the existence of potentially useful and interesting information in repositories that is still difficult to access and reuse; and it also confirms that a large proportion of models is stored as images.…”
Section: Related Workmentioning
confidence: 99%
“…First of all, UML design artifacts are prevalent in open source software repositories but have received relatively little attention from our community. Secondly, researchers have currently made available a large collection of labeled UML diagrams [8], thus facilitating other research groups to reproduce and extend the work presented here. Finally, we believe that classifying sequence and class diagrams is a natural binary classification task for low-shot learning and each type diagram has tell-tale features that should be learnable with a relatively few number of instances and also generalizable to unseen data.…”
Section: An Application Of Low-shot Learningmentioning
confidence: 99%
“…This paper provides a proof-of-concept for the application of low-shot learning to mining software artifacts. In particular, we focus on the task of classifying unified modeling language (UML) diagrams from a recently-published, publicly-available dataset [8].…”
mentioning
confidence: 99%
“…In search of evidence [8] to substantiate this belief, we start from a publicly available data set of open-source software projects on GITHUB that use UML models [9], and: 1) assemble a control group of GITHUB projects not known to use UML models; 2) mine data from the GITHUB issue trackers of both sets of projects (using and not using UML models), estimating their defect rates ("bug" issue reports) as a proxy for software quality; and 3) use multivariate statistical modeling to estimate the impact of having UML models on defect proneness, while controling for confounding factors. Our results reveal a small statistically significant effect of using UML models on defect proneness, i.e., projects with UML models tend to have fewer defects.…”
Section: Does Uml Modeling Associate With Lower Defect Proneness?: a mentioning
confidence: 99%