Background
Infectious diseases are major medical and social challenges of the 21
st
century. Accurately predicting incidence is of great significance for public health organizations to prevent the spread of diseases. Internet search engine data, like Baidu search index, may be useful for analyzing epidemics and improving prediction.
Methods
We collected data on hepatitis E incidence and cases in Shandong province from January 2009 to December 2022 are extracted. Baidu index is available from January 2009 to December 2022. Employing Pearson correlation analysis, we validated the relationship between the Baidu index and hepatitis E incidence. We utilized various LSTM architectures, including LSTM, stacked LSTM, attention-based LSTM, and attention-based stacked LSTM, to forecast hepatitis E incidence both with and without incorporating the Baidu index. Meanwhile, we introduce KAN to LSTM models for improving nonlinear learning capability. The performance of models are evaluated by three standard quality metrics, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE).
Results
Adjusting for the Baidu index altered the correlation between hepatitis E incidence and the Baidu index from -0.1654 to 0.1733. Without Baidu index, we obtained 17.04±0.13%, 17.19±0.57%, in terms of MAPE, by LSTM and attention based stacked LSTM, respectively. With the Baidu index, we obtained 15.36±0.16%, 15.15±0.07%, in term of MAPE, by the same methods. The prediction accuracy increased by 2%. The methods with KAN can improve the performance by 0.3%. More detailed results are shown in results section of this paper.
Conclusions
Our experiments reveal a weak correlation and similar trends between the Baidu index and hepatitis E incidence. Baidu index proves to be valuable for predicting hepatitis E incidence. Furthermore, stack layers and KAN can also improve the representational ability of LSTM models.