Mengmeng Zhai scite author profile

Background Brucellosis is a major public health problem that seriously affects developing countries and could cause significant economic losses to the livestock industry and great harm to human health. Reasonable prediction of the incidence is of great significance in controlling brucellosis and taking preventive measures. Methods Our human brucellosis incidence data were extracted from Shanxi Provincial Center for Disease Control and Prevention. We used seasonal-trend decomposition using Loess (STL) and monthplot to analyse the seasonal characteristics of human brucellosis in Shanxi Province from 2007 to 2017. The autoregressive integrated moving average (ARIMA) model, a combined model of ARIMA and the back propagation neural network (ARIMA-BPNN), and a combined model of ARIMA and the Elman recurrent neural network (ARIMA-ERNN) were established separately to make predictions and identify the best model. Additionally, the mean squared error (MAE), mean absolute error (MSE) and mean absolute percentage error (MAPE) were used to evaluate the performance of the model. Results We observed that the time series of human brucellosis in Shanxi Province increased from 2007 to 2014 but decreased from 2015 to 2017. It had obvious seasonal characteristics, with the peak lasting from March to July every year. The best fitting and prediction effect was the ARIMA-ERNN model. Compared with those of the ARIMA model, the MAE, MSE and MAPE of the ARIMA-ERNN model decreased by 18.65, 31.48 and 64.35%, respectively, in fitting performance; in terms of prediction performance, the MAE, MSE and MAPE decreased by 60.19, 75.30 and 64.35%, respectively. Second, compared with those of ARIMA-BPNN, the MAE, MSE and MAPE of ARIMA-ERNN decreased by 9.60, 15.73 and 11.58%, respectively, in fitting performance; in terms of prediction performance, the MAE, MSE and MAPE decreased by 31.63, 45.79 and 29.59%, respectively. Conclusions The time series of human brucellosis in Shanxi Province from 2007 to 2017 showed obvious seasonal characteristics. The fitting and prediction performances of the ARIMA-ERNN model were better than those of the ARIMA-BPNN and ARIMA models. This will provide some theoretical support for the prediction of infectious diseases and will be beneficial to public health decision making.

show abstract

Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier

Wang

Zhai

Ren³

et al. 2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM. Methods Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model. Results According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes. Conclusions The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.

show abstract

Application of a novel hybrid algorithm of Bayesian network in the study of hyperlipidemia related factors: a cross-sectional study

Wang

Pan

Ren³

et al. 2021

BMC Public Health

View full text Add to dashboard Cite

Background This article aims to understand the prevalence of hyperlipidemia and its related factors in Shanxi Province. On the basis of multivariate Logistic regression analysis to find out the influencing factors closely related to hyperlipidemia, the complex network connection between various variables was presented through Bayesian networks(BNs). Methods Logistic regression was used to screen for hyperlipidemia-related variables, and then the complex network connection between various variables was presented through BNs. Since some drawbacks stand out in the Max-Min Hill-Climbing (MMHC) hybrid algorithm, extra hybrid algorithms are proposed to construct the BN structure: MMPC-Tabu, Fast.iamb-Tabu and Inter.iamb-Tabu. To assess their performance, we made a comparison between these three hybrid algorithms with the widely used MMHC hybrid algorithm on randomly generated datasets. Afterwards, the optimized BN was determined to explore to study related factors for hyperlipidemia. We also make a comparison between the BN model with logistic regression model. Results The BN constructed by Inter.iamb-Tabu hybrid algorithm had the best fitting degree to the benchmark networks, and was used to construct the BN model of hyperlipidemia. Multivariate logistic regression analysis suggested that gender, smoking, central obesity, daily average salt intake, daily average oil intake, diabetes mellitus, hypertension and physical activity were associated with hyperlipidemia. BNs model of hyperlipidemia further showed that gender, BMI, and physical activity were directly related to the occurrence of hyperlipidemia, hyperlipidemia was directly related to the occurrence of diabetes mellitus and hypertension; the average daily salt intake, daily average oil consumption, smoking, and central obesity were indirectly related to hyperlipidemia. Conclusions The BN of hyperlipidemia constructed by the Inter.iamb-Tabu hybrid algorithm is more reasonable, and allows for the overall linking effect between factors and diseases, revealing the direct and indirect factors associated with hyperlipidemia and correlation between related variables, which can provide a new approach to the study of chronic diseases and their associated factors.

show abstract

Study on the prediction effect of a combined model of SARIMA and LSTM based on SSA for influenza in Shanxi Province, China

Zhao

Zhai

Li³

et al. 2023

BMC Infect Dis

View full text Add to dashboard Cite

Background Influenza is an acute respiratory infectious disease that is highly infectious and seriously damages human health. Reasonable prediction is of great significance to control the epidemic of influenza. Methods Our Influenza data were extracted from Shanxi Provincial Center for Disease Control and Prevention. Seasonal-trend decomposition using Loess (STL) was adopted to analyze the season characteristics of the influenza in Shanxi Province, China, from the 1st week in 2010 to the 52nd week in 2019. To handle the insufficient prediction performance of the seasonal autoregressive integrated moving average (SARIMA) model in predicting the nonlinear parts and the poor accuracy of directly predicting the original sequence, this study established the SARIMA model, the combination model of SARIMA and Long-Short Term Memory neural network (SARIMA-LSTM) and the combination model of SARIMA-LSTM based on Singular spectrum analysis (SSA-SARIMA-LSTM) to make predictions and identify the best model. Additionally, the Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were used to evaluate the performance of the models. Results The influenza time series in Shanxi Province from the 1st week in 2010 to the 52nd week in 2019 showed a year-by-year decrease with obvious seasonal characteristics. The peak period of the disease mainly concentrated from the end of the year to the beginning of the next year. The best fitting and prediction performance was the SSA-SARIMA-LSTM model. Compared with the SARIMA model, the MSE, MAE and RMSE of the SSA-SARIMA-LSTM model decreased by 38.12, 17.39 and 21.34%, respectively, in fitting performance; the MSE, MAE and RMSE decreased by 42.41, 18.69 and 24.11%, respectively, in prediction performances. Furthermore, compared with the SARIMA-LSTM model, the MSE, MAE and RMSE of the SSA-SARIMA-LSTM model decreased by 28.26, 14.61 and 15.30%, respectively, in fitting performance; the MSE, MAE and RMSE decreased by 36.99, 7.22 and 20.62%, respectively, in prediction performances. Conclusions The fitting and prediction performances of the SSA-SARIMA-LSTM model were better than those of the SARIMA and the SARIMA-LSTM models. Generally speaking, we can apply the SSA-SARIMA-LSTM model to the prediction of influenza, and offer a leg-up for public policy.

show abstract

Study on the prediction effect of a combined model of SARIMA and LSTM based on SSA for influenza in Shanxi Province, China

Zhao

Zhai

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Background: Influenza is an acute respiratory infectious disease that is highly infectious and seriously damages human health. Reasonable prediction is of great significance to control the epidemic of influenza. Methods: Our Influenza data were extracted from Shanxi Provincial Center for Disease Control and Prevention. Seasonal-trend decomposition using Loess (STL) was adopted to analyze the season characteristics of the influenza in Shanxi Province, China, from the 1st week in 2010 to the 52nd week in 2019. To handle the insufficient prediction performance of the seasonal autoregressive integrated moving average (SARIMA) model in predicting the nonlinear parts and the poor accuracy of directly predicting the original sequence, this study established the SARIMA model, the combination model of SARIMA and Long-Short Term Memory neural network (SARIMA-LSTM) and the combination model of SARIMA-LSTM based on Singular spectrum analysis (SSA-SARIMA-LSTM) to make predictions and identify the best model. Additionally, the Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were used to evaluate the performance of the models. Results: The influenza time series in Shanxi Province from the 1st week in 2010 to the 52nd week in 2019 showed a year-by-year decrease with obvious seasonal characteristics. The peak period of the disease mainly concentrated from the end of the year to the beginning of the next year. The best fitting and prediction performance was the SSA-SARIMA-LSTM model. Compared with the SARIMA model, the MSE, MAE and RMSE of the SSA-SARIMA-LSTM model decreased by 38.12, 17.39 and 21.34%, respectively, in fitting performance; the MSE, MAE and RMSE decreased by 42.41, 18.69 and 24.11%, respectively, in prediction performances. Furthermore, compared with the SARIMA-LSTM model, the MSE, MAE and RMSE of the SSA-SARIMA-LSTM model decreased by 28.26, 14.61 and 15.30%, respectively, in fitting performance; the MSE, MAE and RMSE decreased by 36.99, 7.22 and 20.62%, respectively, in prediction performances. Conclusions: The fitting and prediction performances of theSSA-SARIMA-LSTM model were better than those of the SARIMA and theSARIMA-LSTM models. Generally speaking, we can apply the SSA-SARIMA-LSTM model to the prediction of influenza, and offer a leg-up for public policy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mengmeng Zhai

Research on the predictive effect of a combined model of ARIMA and neural networks on human brucellosis in Shanxi Province, China: a time series predictive analysis

Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier

Application of a novel hybrid algorithm of Bayesian network in the study of hyperlipidemia related factors: a cross-sectional study

Study on the prediction effect of a combined model of SARIMA and LSTM based on SSA for influenza in Shanxi Province, China

Study on the prediction effect of a combined model of SARIMA and LSTM based on SSA for influenza in Shanxi Province, China

Contact Info

Product

Resources

About