<abstract>
<p>Web-based search query data have been recognized as valuable data sources for discovering new influenza epidemics. However, selecting search and query keywords and adopting prediction methods pose key challenges to improving the effectiveness of influenza prediction. In this study, web search data were analyzed and excavated using big data and machine learning methods. The flu prediction model for the southern region of China, considering the impact of influenza transmission across regions and based on various keywords and historical influenza-like illness percentage (ILI%) data, was built (models 1–4) to verify the factors affecting the spread of the flu. To improve the accuracy of the influenza trend prediction, a support vector regression method based on an improved particle swarm optimization algorithm was proposed (IPSO-SVR), which was applied to the influenza prediction model to forecast ILI% in southern China. By comparing and analyzing the prediction results of each model, model 4, using the IPSO-SVR algorithm, exhibited higher prediction precision and more effective results, with its prediction indexes including the mean square error (MSE), root mean square error (RMSE) and mean absolute error (MAE) being 0.0596, 0.2441 and 0.1884, respectively. The experimental results show that the prediction precision significantly increased when the IPSO-SVR method was applied to the constructed ILI% model. A new theoretical basis and implementation strategy were provided for achieving more accurate influenza prevention and control in southern China.</p>
</abstract>