A large scale study of programming languages and code quality in github

Ray, Baishakhi; Posnett, Daryl; Filkov, Vladimir; Dévanbu, Prémkumar

doi:10.1145/2635868.2635922

Cited by 312 publications

(216 citation statements)

References 36 publications

Supporting

Mentioning

207

Contrasting

Unclassified

Order By: Relevance

“…As in other studies (e.g. [44]) we identified bug related commits by filtering those that contain error related keywords, such as 'error', 'bug', 'fix' and 'issue' in the corresponding commit message.…”

Section: Law Vii: Declining Qualitymentioning

confidence: 99%

Studying the evolution of PHP web applications

Amanatidis

Chatzigeorgiou

2016

Information and Software Technology

View full text Add to dashboard Cite

Section: Law Vii: Declining Qualitymentioning

confidence: 99%

Studying the evolution of PHP web applications

Amanatidis

Chatzigeorgiou

2016

Information and Software Technology

View full text Add to dashboard Cite

“…We have recently categorized GitHub projects into six general and disjoint domains, including Databases, Libraries, etc. [40] As code in these different categories may be substantially different [40], it is reasonable to expect that the code development process, including debugging, may be different across these domains. While there we did not find a relationship between code quality and application domain in our prior work, [40], assert use might be related to the domain.…”

Section: Research Goalsmentioning

confidence: 99%

“…In our recent work on code defects in the GitHub corpus [40], we categorized projects based on their application domain into six general groups: Applications, Code Analyzer, Database, Framework, Library, and Middleware. To investigate whether the use of asserts depended in anyway on the application domain, we used negative binomial regression due to the smaller sample size (see Methodology), with the domain as a factor, while controlling for the total number of lines, developers, and age of the project.…”

Section: Experience Of Developersmentioning

confidence: 99%

Assert Use in GitHub Projects

Casalnuovo

Dévanbu

Oliveira

et al. 2015

2015 IEEE/ACM 37th IEEE International Conference on Software Engineering

Self Cite

View full text Add to dashboard Cite

Asserts have long been a strongly recommended (if non-functional) adjunct to programs. They certainly don't add any user-evident feature value; and it can take quite some skill and effort to devise and add useful asserts. However, they are believed to add considerable value to the developer. Certainly, they can help with automated verification; but even in the absence of that, claimed advantages include improved understandability, maintainability, easier fault localization and diagnosis, all eventually leading to better software quality. We focus on this latter claim, and use a large dataset of asserts in C and C++ programs to explore the connection between asserts and defect occurrence. Our data suggests a connection: functions with asserts do have significantly fewer defects. This indicates that asserts do play an important role in software quality; we therefore explored further the factors that play a role in assertion placement: specifically, process factors (such as developer experience and ownership) and product factors, particularly interprocedural factors, exploring how the placement of assertions in functions are influenced by local and global network properties of the callgraph. Finally, we also conduct a differential analysis of assertion use across different application domains.

show abstract

“…Third, we aim at covering the range of metrics that are used to measure or approximate size in other research works, e.g. by Malaiya [4], Nugroho et al [7], or Ray et al [10]. Therefore, we consider metrics that base on different concepts, e.g.…”

Section: A Considered Size Metricsmentioning

confidence: 99%

“…KLOC as by Nugroho et al [7], but also sometimes approximated, e.g. as number of commits by Ray et al [10].…”

Section: Introductionmentioning

confidence: 99%

Identifying Metrics' Biases When Measuring or Approximating Size in Heterogeneous Languages

Hebig

Derehag

Chaudron

2015

2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

View full text Add to dashboard Cite

Context: To compare the effectiveness of development techniques, the size of compared software systems needs to be taken into account. However, in industry new development techniques often come with changes in the applied programming languages. Goal: Our goal is to investigate how different size metrics and approximations are biased towards the languages c and c++. Further, we investigate whether triangulation of metrics has the potential to compensate for biases. Method: We identify crucial preconditions for a triangulation and investigate on 34 open source projects, whether a set of 16 size metrics fulfills these preconditions for the languages c and c++. Results: We identify how metrics differ in their biases and find that the preconditions for triangulation are fulfilled. Conclusion: Triangulation has the potential to address language biases, but high variance among metrics and tools need to be taken into account, too.

show abstract

A large scale study of programming languages and code quality in github

Cited by 312 publications

References 36 publications

Studying the evolution of PHP web applications

Studying the evolution of PHP web applications

Assert Use in GitHub Projects

Identifying Metrics' Biases When Measuring or Approximating Size in Heterogeneous Languages

Contact Info

Product

Resources

About