2022
DOI: 10.1371/journal.pntd.0010517
|View full text |Cite
|
Sign up to set email alerts
|

Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data

Abstract: Background Previous epidemiological studies have examined the prevalence and risk factors for a variety of parasitic illnesses, including protozoan and soil-transmitted helminth (STH, e.g., hookworms and roundworms) infections. Despite advancements in machine learning for data analysis, the majority of these studies use traditional logistic regression to identify significant risk factors. Methods In this study, we used data from a survey of 54 risk factors for intestinal parasitosis in 954 Ethiopian school c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 55 publications
0
9
0
Order By: Relevance
“…It is a fundamental data mining technique that thoroughly searches for frequent patterns, correlations, and associations among the sets of variables making them ideal for discovering predictive rules from medical data repositories [ 60 , 61 ]. It has been used in prior healthcare research to identify risk factors for various health outcomes such as early childhood caries [ 62 ], parasite infection [ 63 ], motorcycle crash casualty [ 64 ], stroke [ 65 ], and to discover symptom patterns of coronavirus disease of 2019 (COVID-19) [ 66 ].…”
Section: Methodsmentioning
confidence: 99%
“…It is a fundamental data mining technique that thoroughly searches for frequent patterns, correlations, and associations among the sets of variables making them ideal for discovering predictive rules from medical data repositories [ 60 , 61 ]. It has been used in prior healthcare research to identify risk factors for various health outcomes such as early childhood caries [ 62 ], parasite infection [ 63 ], motorcycle crash casualty [ 64 ], stroke [ 65 ], and to discover symptom patterns of coronavirus disease of 2019 (COVID-19) [ 66 ].…”
Section: Methodsmentioning
confidence: 99%
“…The training split was used to generate the learned models, while the testing dataset was used for the validation phase to assess the performance of each model to predict the class labels (answering yes or no to TM try). One-hot encoding was applied to all categorical variables with more than two categories, with missing data considered as a category and the elimination of one category for each factor to avoid multicollinearity [ 7 ]. Collinear covariates, with a variance inflation factor > 2.5, were excluded from the analysis [ 8 ].…”
Section: Methodsmentioning
confidence: 99%
“…More specifically, each of the models underwent a tenfold cross-validation and the classifier hyperparameters were tuned. A random-search approach for model parameter tuning was used to determine the optimal combination of hyperparameters for maximizing accuracy to generate the best model parameters [ 7 ]. To ensure robust results, the cross-validation was performed ten times using a different random number generator seed each time [ 7 ].…”
Section: Methodsmentioning
confidence: 99%
“…These approaches appear to be a promising alternative to logistic regression since they avoid overfitting ( 28 , 29 ). ML techniques have previously been used to identify risk factors for parasitic infections, congestive heart failure, diabetes, overweight/obesity, and dementia ( 30 34 ). Moreover, several recent studies have used ML approaches to effectively predict undernutrition outcomes in Bangladesh, India, and Ethiopia ( 35 39 ).…”
Section: Introductionmentioning
confidence: 99%
“…Despite these advances, most studies failed to implement association rule learning, another promising ML method that may facilitate the identification of risk factors. Association rule learning has previously shown promise in predicting disease co-occurrences and risk factors for parasitic infection ( 34 , 40 ). Altogether, these ML methods could identify important risk factors for undernutrition, and potentially provide crucial insights into how the co-occurrence of variables may lead to undernutrition.…”
Section: Introductionmentioning
confidence: 99%