Background: The current malaria diagnosis methods, which rely on microscopy and Histidine Rich Protein2 (HRP2) based RDT, have drawbacks that necessitate the development of improved and complementary malaria diagnostic methods to overcome some or all the limitations. Consequently, automated detection and classification of malaria can provide patients with a faster and more accurate diagnosis. This study, therefore, used a machine learning model to predict the occurrence of malaria based on socio-demographicbehaviour, environment, and clinical features.
Methods:Data from 200 Nigerian patients were used to develop predictive models using nested cross-validation and sequential backward features selection (SBFS), with 80% of the dataset randomly selected for training and optimisation and the remaining 20% for testing the models.
Results: Among the three machine learningmodels examined, penalised logistic regression model had the best area under the receiver operating characteristic (ROC) curve performance for the training set (84%; 95% confidence interval (CI) =75–93%) and test set (83%; 95% CI =63–100%). The model included age, BMI, body temperature, bushes in surroundings, body weight, dizziness, fever, headache, mosquito repellant, muscle pain, sex, sore throat, stagnant water in the home, and vomiting. An increased odd of patients having malaria was associated with high body weight (adjusted odd ratio (AOR) = 4.50, 95% CI =2.27-8.01, p-value <0.0001). Even though the association between the odds of having malaria and body temperature was insignificant, patients who had body temperature had higher odds of having malaria than those who did not have body temperature (AOR = 1.40, CI =0.99-1.91, p-value = 0.068). Also, patients who had bushes in the surroundings (AOR = 2.60, 95% CI =1.30-4.66, p-value = 0.006) or experienced fever (AOR = 2.10, CI =0.88–4.24, p-value = 0.099), headache (AOR = 2.07; CI =0.95–3.95, p-value = 0.068), muscle pain (AOR =1.49; CI =0.66–3.39, p-value = 0.333) and vomiting (AOR = 2.32; CI =0.85–6.82, p-value = 0.097) were more likely to experience malaria compared to those without bushes in the surrounding or those who did not experience fever, headache, muscle pain and vomiting. In contrast, decreased odds of malaria were associated with age (AOR = 0.62, 95% CI= 0.41-0.90, p-value = 0.012) or BMI (AOR = 0.47, 95% CI= 0.26-0.80, p-value =0.006).
Conclusion:Newly developed routinely collected baseline socio-demographical, environmental, and clinical features topredict malaria types and may serve as a valuable tool for clinical decision making.