Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering 2014
DOI: 10.1145/2635868.2635922
|View full text |Cite
|
Sign up to set email alerts
|

A large scale study of programming languages and code quality in github

Abstract: What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

7
207
0
2

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 312 publications
(216 citation statements)
references
References 36 publications
7
207
0
2
Order By: Relevance
“…As in other studies (e.g. [44]) we identified bug related commits by filtering those that contain error related keywords, such as 'error', 'bug', 'fix' and 'issue' in the corresponding commit message.…”
Section: Law Vii: Declining Qualitymentioning
confidence: 99%
“…As in other studies (e.g. [44]) we identified bug related commits by filtering those that contain error related keywords, such as 'error', 'bug', 'fix' and 'issue' in the corresponding commit message.…”
Section: Law Vii: Declining Qualitymentioning
confidence: 99%
“…We have recently categorized GitHub projects into six general and disjoint domains, including Databases, Libraries, etc. [40] As code in these different categories may be substantially different [40], it is reasonable to expect that the code development process, including debugging, may be different across these domains. While there we did not find a relationship between code quality and application domain in our prior work, [40], assert use might be related to the domain.…”
Section: Research Goalsmentioning
confidence: 99%
“…In our recent work on code defects in the GitHub corpus [40], we categorized projects based on their application domain into six general groups: Applications, Code Analyzer, Database, Framework, Library, and Middleware. To investigate whether the use of asserts depended in anyway on the application domain, we used negative binomial regression due to the smaller sample size (see Methodology), with the domain as a factor, while controlling for the total number of lines, developers, and age of the project.…”
Section: Experience Of Developersmentioning
confidence: 99%
“…Third, we aim at covering the range of metrics that are used to measure or approximate size in other research works, e.g. by Malaiya [4], Nugroho et al [7], or Ray et al [10]. Therefore, we consider metrics that base on different concepts, e.g.…”
Section: A Considered Size Metricsmentioning
confidence: 99%
“…KLOC as by Nugroho et al [7], but also sometimes approximated, e.g. as number of commits by Ray et al [10].…”
Section: Introductionmentioning
confidence: 99%