Software companies aim to develop high-quality software projects with the best global resources at the best cost. To achieve this global software development (GSD), an approach should be used which adopts work on projects across multiple distributed locations, and this is also known as distributed development. When companies attempt to implement GSD, they face numerous challenges owing to the nature of GSD and its differences from traditional methods. The objectives of this study were to identify the top software development factors that affect the overall success or failure of a software project using exploratory data analysis to find relationships between these factors, and to develop and compare risk prediction models that use machine learning classification techniques such as logistic regression, decision tree, random forest, support vector machine, K-nearest neighbors, and Naive Bayes. The findings of this study are as follows: in GSD, the top 18 factors influencing the software project are listed; and experiments show that the logistic regression and random forest models provide the best results, with an accuracy of 89% and 85%, respectively, and an area under the curve of 73% and 71%, respectively.