The main tool of cockpit crew performance evaluation is the recorded flight data used for flight operations safety improvement since all certified airlines require implementation of a safety and quality management system. The safety performance of a flight has been a challenging issue in the aviation industry and plays an important role to acquire competitive benefits. In this study, an integrated multi-class classification machine learning models and Markov chain were developed for cockpit crew performance evaluation during their flights. At the outset, the main features related to a flight are identified based on the literature review, flight operations expert’s statements, and the case study dataset (as numerical example). Afterwards, the flights’ performance is evaluated as a target column based on four multi-class classification models (Decision Tree, Support Vector Machine, Neural Network, and Random Forest). The results showed that the random forest classifier has the greatest value in all evaluation metrics (i.e., accuracy = 0.90, precision = 0.91, recall = 0.97, and F1-score = 0.93). Therefore, this model can be used by the airline companies to predict flight crew performance before the flight in order to prevent or decrease flight safety risks.