Background: Bluetongue virus (BTV) is an arbovirus that causes lots of economic losses worldwide. The most common method of transmission is by vector Culicoides midges. Due to this close relationship between the BTV infection and the vectors, many climate-related risk factors play a role in the occurrence of the disease. The predictive ability of Logistic Regression (LR), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RF), XGBoost and Artificial Neural Networks (ANN) algorithms in predicting the BTV infection occurrence was assessed. Evaluated predictive risk factors included 19 standard bioclimatic variables, meteorological variables, ruminant population density, elevation and land cover data.
Results: Based on the results of the ExtraTreesClassifier algorithm, 19 variables were identified as important features in prediction which mostly included bioclimatic variables related to temperature. Different combinations of predictive risk factors were evaluated in separate models. ANN and RF algorithms, especially when all predictor variables were included together showed the best performance in predicting the BTV infection occurrence.
Conclusions: RF and ANN algorithms outperformed other machine learning methods in predicting the occurrence of BTV infection, especially when all predictive risk factors were included. Moreover, compared to meteorological, ruminant population density, altitude and land cover features, bioclimatic variables especially those related to temperature played a more important role in predicting the occurrence of BTV infection using machine learning algorithms. The results of the present study could be helpful in planning BTV infection surveillance and adopting control and preventive strategies.