Purpose: To evaluate the efficacy of prominent machine learning algorithms in predicting normal tissue complication probability utilizing clinical data obtained from two distinct disease sites, and to create a software tool that facilitates the automatic determination of the optimal algorithm to model any given labeled dataset. Methods and Materials: We obtained 3 sets of radiation toxicity data (478 patients) from our clinic, gastrointestinal toxicity (GIT), radiation pneumonitis (RP), and radiation esophagitis (RE). These data comprised clinicopathological and dosimetric information for patients diagnosed with non-small cell lung cancer and anal squamous cell carcinoma. Each dataset was modeled using ten commonly employed machine learning algorithms (elastic net, LASSO, random forest, regression forest, support vector machine, XGBoost, k-nearest-neighbors, neural network, Bayesian-LASSO, and Bayesian neural network) by randomly dividing the dataset into a training and test set. The training set was used to create and tune the model, and the test set served to assess it by calculating performance metrics. This process was repeated 100 times by each algorithm for each dataset. Figures were generated to visually compare the performance of the algorithms. A graphical user interface was developed to automate this whole process. Results: LASSO achieved the highest area under the precision-recall curve (AUPRC) (0.807±0.067) for RE, random forest for GIT (0.726±0.096), and the neural network for RP (0.878±0.060). Area-under-the-curve was 0.754±0.069, 0.889±0.043, and 0.905±0.045, respectively. The graphical user interface was used to compare all algorithms for each dataset automatically. When averaging AUPRC across all toxicities, Bayesian-LASSO was the best model. Conclusion: Our results show that there is no best algorithm for all datasets. Therefore, it is important to compare multiple algorithms when training an outcome prediction model on a new dataset. The graphical user interface created for this study automatically compares the performance of these ten algorithms for any dataset.