“…35 We repeated five times random test/training splits to avoid sampling bias and the average of evaluation metrics was reported. After the database was established, a broad set of nine machine learning models, including Logistic Regression (LR), 30 Gaussian Naï ve Bayes (GNB), 31 k-Nearest Neighbors (KNN), 32 Support Vector Machine (SVM), 33 Decision Tree (DT), 34 Random Forests (RF), 35 Adaptive Boosting (ADA), 36 eXtreme Gradient Boosting (XGB) 37 and Multilayer Perceptron (MLP) 38 were trained by a grid-search cross-validation (5-fold GridSearchCV) method and the hyperparameters of a single-shot trial was summarized in Table S2. The evaluation metrics including accuracy, precision, recall, F1, receiver operating characteristic (ROC) curve were obtained by comparing the predicted results and the ground truths (Table S3) Although these machine learning models offer individual advantages, such as high accuracy for classification, easiness to operate or good interpretability, they must be weighed carefully for a new application.…”