2013
DOI: 10.1007/978-3-642-35843-2_6
|View full text |Cite
|
Sign up to set email alerts
|

A Model of the Commit Size Distribution of Open Source

Abstract: A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the goodness of fit of our model. By measuring and modeling a fundamental dimension of programming we help improve softw… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
14
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
4
1

Relationship

2
8

Authors

Journals

citations
Cited by 19 publications
(15 citation statements)
references
References 14 publications
1
14
0
Order By: Relevance
“…Inspection by hand shows that many times, code is being committed in large chunks. This is uncommon in traditional open source software development, where the most frequent commit size is one line of code [16]. Thus, it is safe to assume that these small but still active projects are being developed in-house and are being provided in a snapshot-style to the public at appropriate times.…”
Section: Project Classificationmentioning
confidence: 99%
“…Inspection by hand shows that many times, code is being committed in large chunks. This is uncommon in traditional open source software development, where the most frequent commit size is one line of code [16]. Thus, it is safe to assume that these small but still active projects are being developed in-house and are being provided in a snapshot-style to the public at appropriate times.…”
Section: Project Classificationmentioning
confidence: 99%
“…The success of these approaches depends on the accuracy of feature location techniques which are often still low [32]. Also, the quality of the code comments and commits information can be poor due to outdated comments [19], unavailability of authorship information for authors without commit rights in CVS and SVN repositories [23], etc. In this work, we focus on analyzing textual information available in bug reports to recommend appropriate fixers.…”
Section: Automated Bug Triagingmentioning
confidence: 99%
“…Third, file-or class-level granularity could be too large to learn patterns of transformation. Finally, considering arbitrarily long snippets of code, such as hunks in diffs, could make the learning more difficult given the variability in size and context [45], [46]. Note that we consider each TP as an independent fix, meaning that multiple methods modified in the same bug fixing activity are considered independently from one other.…”
Section: B Analysis Of Transformation Pairsmentioning
confidence: 99%