Abstract. Determining what factors can influence the successful outcome of a software project has been labeled by many scholars and software engineers as a difficult problem. In this paper we use machine learning to create a model that can determine the stage a software project has obtained with some accuracy. Our model uses 8 Open Source project metrics to determine the stage a project is in. We validate our model using two performance measures; the exact success rate of classifying an Open Source Software project and the success rate over an interval of one stage of its actual performance using different scales of our dependent variable. In all cases we obtain an accuracy of above 70% with one away classification (a classification which is away by one) and about 40% accuracy with an exact classification. We also determine the factors (according to one classifier) that uses only eight variables among all the variables available in SourceForge, that determine the health of an OSS project.
IntroductionDetermining what makes a software project successful has been a research topic for well over 20 years. The first model that defined the factors influencing software success was published in 1992 by Delone and McLean [1], as the Information Systems Success Model. Since then there has been a considerable effort in research to determine what can be done to minimize project failure. However, factors that influence commercial projects differ from those known as FLOSS or Free/Libre Open Source Software. Attempts at remedying this gap have focused on statistical models that focus on certain aspects of a software development lifecycle. Only recently, has historical data been used to determine the changing nature of factors for success during a projects lifecycle [2]. In this paper we use machine learning in the form of decision trees, to predict the development stage of an Open Source project based on project metrics 1 , project constraints and circumstance. This model will serve as an indicator of OSS project health that will enable developers to determine accurately in what stage their 1 We use the terms metric and attribute to mean the same concept in this paper.