Sourcerer: mining and searching internet-scale software repositories

Linstead, Erik; Bajracharya, Sushil; Ngo, Trung Chi; Rigor, Paul; Lopes, Cristina Videira; Baldi, Pierre

doi:10.1007/s10618-008-0118-x

Cited by 216 publications

(118 citation statements)

References 48 publications

Supporting

Mentioning

114

Contrasting

Unclassified

Order By: Relevance

“…However, this work does not quantify how identifier properties vary, since it ignores variable and type names. Search of code at "internet-scale" was introduced by Linstead et al [8]. Another GitHub dataset, GHTorrent [9] has a different goal compared to our corpus, excluding source code and focusing on users, pull requests and all the issues surrounding social coding.…”

Section: Related Workmentioning

confidence: 99%

Mining source code repositories at massive scale using language modeling

Allamanis

Sutton

2013

2013 10th Working Conference on Mining Software Repositories (MSR)

254

276

View full text Add to dashboard Cite

Abstract-The tens of thousands of high-quality open source software projects on the Internet raise the exciting possibility of studying software development by finding patterns across truly large source code repositories. This could enable new tools for developing code, encouraging reuse, and navigating large projects. In this paper, we build the first giga-token probabilistic language model of source code, based on 352 million lines of Java. This is 100 times the scale of the pioneering work by Hindle et al. The giga-token model is significantly better at the code suggestion task than previous models. More broadly, our approach provides a new "lens" for analyzing software projects, enabling new complexity metrics based on statistical analysis of large corpora. We call these metrics data-driven complexity metrics. We propose new metrics that measure the complexity of a code module and the topical centrality of a module to a software project. In particular, it is possible to distinguish reusable utility classes from classes that are part of a program's core logic based solely on general information theoretic criteria.

show abstract

Section: Related Workmentioning

confidence: 99%

Mining source code repositories at massive scale using language modeling

Allamanis

Sutton

2013

2013 10th Working Conference on Mining Software Repositories (MSR)

254

276

View full text Add to dashboard Cite

show abstract

“…In order to improve the Classifier's performance, more intelligent source code classification techniques will be implemented in the future (e.g. [12]). …”

Section: Harvesting and Classifying The Learning Materialsmentioning

confidence: 99%

HAPA: Harvester and Pedagogical Agents in E-learning Environments

Ivanović¹,

Mitrović²,

Budimac³

et al. 2015

INT J COMPUT COMMUN

View full text Add to dashboard Cite

In the field of e-learning and tutoring systems two categories of software agents are of the special interest: harvester and pedagogical agents. This paper proposes a novel e-learning system that successfully combines both of these agent categories and introduces two distinct sub-types of pedagogical agents helpful and misleading. Whereas helpful agents provide the correct guidance for the given problem, misleading agents try to guide the learning process in the wrong direction by offering false hints and inadequate solutions. The rationale behind this approach is to motivate students not to trust the agent's instructions blindly, but to employ critical thinking. Consequently, students will be put in a "softly stressed" environment in order to prepare them for real working environments in their future work in companies. Nevertheless students themselves will decide on the correct solution to the problem in question.

show abstract

“…Topic modeling has recently been used in several research areas of software engineering, such as mining software repositories (MSR) [108,109,188], requirements traceability [7], and software evolution [111]. Linstead et al [109] applied LDA topic modeling technique on the source code of different versions in order to analyze software evolution.…”

Section: Topic Modeling In Software Engineeringmentioning

confidence: 99%

“…Linstead et al [109] applied LDA topic modeling technique on the source code of different versions in order to analyze software evolution. Linstead and colleagues [108] further used topic modeling on Internet-scale software repositories, and summarized program function and developer activities by extracting topic-word and author-topic distributions. The use of topic modeling over source code has been validated and it has been found that the evolution of source code topics is indeed caused by actual change activities in the code [188].…”

Section: Topic Modeling In Software Engineeringmentioning

confidence: 99%

Stakeholders' social interaction in requirements engineering of open source software

Bhowmik

2014

2014 IEEE 22nd International Requirements Engineering Conference (RE)

View full text Add to dashboard Cite

In this work, we use the phrase "stakeholders' social interaction" to indicate interaction among stakeholders regarding the software system that takes place through some communication means, such as posting comments and artifacts over the issue tracking system.We investigate the influence of stakeholders' social interaction in different RE activities, in particular, requirements identification, creativity in RE, and requirements implementation of OSS systems. This research enables us to gain valuable insights to generate guidelines for enhancing software engineering practice in relevant areas.

show abstract

Sourcerer: mining and searching internet-scale software repositories

Cited by 216 publications

References 48 publications

Mining source code repositories at massive scale using language modeling

Mining source code repositories at massive scale using language modeling

HAPA: Harvester and Pedagogical Agents in E-learning Environments

Stakeholders' social interaction in requirements engineering of open source software

Contact Info

Product

Resources

About