Transparent environments and social-coding platforms as GitHub help developers to stay abreast of changes during the development and maintenance phase of a project. Especially, notification feeds can help developers to learn about relevant changes in other projects. Unfortunately, transparent environments can quickly overwhelm developers with too many notifications, such that they lose the important ones in a sea of noise. Complementing existing prioritization and filtering strategies based on binary compatibility and code ownership, we develop an anomaly detection mechanism to identify unusual commits in a repository, which stand out with respect to other changes in the same repository or by the same developer. Among others, we detect exceptionally large commits, commits at unusual times, and commits touching rarely changed file types given the characteristics of a particular repository or developer. We automatically flag unusual commits on GitHub through a browser plug-in. In an interactive survey with 173 active GitHub users, rating commits in a project of their interest, we found that, although our unusual score is only a weak predictor of whether developers want to be notified about a commit, information about unusual characteristics of a commit changes how developers regard commits. Our anomaly detection mechanism is a building block for scaling transparent environments.
KEYWORDSanomaly detection, information overload, notification feeds, software ecosystems, transparent environments
INTRODUCTIONCollaborative development in open source, software ecosystems, and also industrial software systems relies increasingly on decentralized decision making. 1-4 Interdependent components evolve independently and often with little explicit collaboration. Backward-incompatible changes that break modularity and produce rippling effects on downstream components are often necessary to avoid opportunity costs (not fixing mistakes, stifling change in the face of evolving requirements) and common in practice. [5][6][7][8][9][10][11][12][13][14][15] In addition, components may change to add new functionality that developers might want to adopt. Identifying relevant changes and reacting to them if needed can create a significant burden on developers during maintenance. 10,[15][16][17][18][19][20] Seeds of a solution can be found in today's transparent environments or social-coding platforms such as GitHub, LaunchPad, and Bitbucket. These environments provide mechanisms for notification and exploration, that help developers to stay abreast of activities across collections of projects without central planning. 21,22 For example, on GitHub, developers can watch projects and receive a notification feed of activities in watched projects, such as push events or bug reports. These tools work well at small scales but break down for large projects where imprecise and insufficiently rich notification mechanisms lead to information overload from notification cluttering. By inspecting publicly available events on GitHub, we found t...