The issue of identifying the prevalence of sickness that is linked to the population of a nation, state, neighborhood, organization, or school has not been taken into consideration by the majority of prior studies on the prediction of illness among populations. They frequently merely choose any sickness based on assumption, while those that determined the prevalence of the condition before developing their framework utilized survey data or data from web repositories, which removes idiosyncrasies from those data. In order to increase performance, this research suggests an enhanced data analytics framework for the predictive diagnosis of common illnesses affecting university students. In order to do this, exploratory data analysis (EDA) using a multivariate analytic technique was conducted using a high-level model methodology using CRISP-DM stages. When the suggested strategy was evaluated on support vector machines, ensemble gradient boosting, random forest, decision tree, K-neighbors, and linear regression machine learning models, experimental findings revealed that it outperformed current methods.
In comparison to other reviewed frameworks that used survey datasets, standardized or online repositories' dataset, the framework with emphasis on the ensemble Gradient Boosting classifier and regression had accuracy of 100% and mean absolute error of 0.18, respectively. It is also steady due to its ability to manage both small and big data sets without impacting the model's performance. The enhanced results through localized dataset demonstrate the benefit of including local data sources in the process of developing models for the diagnosis and prognosis of prevalent illnesses of any area with people.