With over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the information stored in GitHub's event logs, trying to understand how its users employ the site to collaborate on software. However, so far there have been no studies describing the quality and properties of the data available from GitHub. We document the results of an empirical study aimed at understanding the characteristics of the repositories in GitHub and how users take advantage of GitHub's main featuresnamely commits, pull requests, and issues. Our results indicate that, while GitHub is a rich source of data on software development, mining GitHub for research purposes should take various potential perils into consideration. We show, for example, that the majority of the projects are personal and inactive; that GitHub is also being used for free storage and as a Web hosting service; and that almost 40% of all pull requests do not appear as merged, even though they were. We provide a set of recommendations for software engineering researchers on how to approach the data in GitHub.
A critical factor in work group coordination, communication has been studied extensively. Yet, we are missing objective evidence of the relationship between successful coordination outcome and communication structures. Using data from IBM's Jazz TM project, we study communication structures of development teams with high coordination needs. We conceptualize coordination outcome by the result of their code integration build processes (successful or failed) and study team communication structures with social network measures.Our results indicate that developer communication plays an important role in the quality of software integrations. Although we found that no individual measure could indicate whether a build will fail or succeed, we leveraged the combination of communication structure measures into a predictive model that indicates whether an integration will fail. When used for five project teams, our predictive model yielded recall values between 55% and 75%, and precision values between 50% to 76%.
We conducted an industrial case study of a distributed team in the USA and the Czech Republic that used Extreme Programming. Our goal was to understand how this globally-distributed team created a successful project in a new problem domain using a methodology that is dependent on informal, face-to-face communication. We collected quantitative and qualitative data and used grounded theory to identify four key factors for communication in globally-distributed XP teams working within a new problem domain. Our study suggests that, if these critical enabling factors are addressed, methodologies dependent on informal communication can be used on global software development projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.