Toward solving the slow convergence and low prediction accuracy problems associated with XGBoost in COVID-19-based transmission prediction, a novel algorithm based on guided aggregation is presented to optimize the XGBoost prediction model. In this study, we collect the early COVID-19 propagation data using web crawling techniques and use the Lasso algorithm to select the important attributes to simplify the attribute set. Moreover, to improve the global exploration and local mining capability of the grey wolf optimization (GWO) algorithm, a backward learning strategy has been introduced, and a chaotic search operator has been designed to improve GWO. In the end, the hyperparameters of XGBoost are continuously optimized using COLGWO in an iterative process, and Bagging is employed as a method of integrating the prediction effect of the COLGWO-XGBoost model optimization. The experiments, firstly, compared the search means and standard deviations of four search algorithms for eight standard test functions, and then, they compared and analyzed the prediction effects of fourteen models based on the COVID-19 web search data collected in China. Results show that the improved grey wolf algorithm has excellent performance benefits and that the combined model with integrated learning has good prediction ability. It demonstrates that the use of network search data in the early spread of COVID-19 can complement the historical information, and the combined model can be further extended to be applied to other prevention and control early warning tasks of public emergencies.
The emission peak and carbon neutrality targets pose a great challenge to carbon emission reduction in the coal industry, and the coal industry will face an all-around deep adjustment. The forecast of coal price is crucial for reducing carbon emissions in the coal industry in an orderly manner under the premise of ensuring national energy security. The volatility and instability of coal prices are a result of multiple influencing factors, making it very difficult to make accurate predictions of coal price changes. We propose in this paper an innovative hybrid forecasting method (CEEMDAN-GWO-CatBoost) for forecasting coal price indexes by combining machine learning models, feature selections, data decomposition, and model interpretation. By combining high forecasting accuracy with good interpretability, this method fills a gap in the field of coal price forecasting. Initially, we examine the factors that influence coal prices from five angles: Supply, demand, macroeconomic factors, freight costs, and substitutes; and we employ Spearman correlation analysis to reduce the complexity of the attribute set and devise a coal price forecasting index system. Secondly, the CEEMDAN method is used to decompose the raw coal price index data into seven intrinsic modal functions and one residual term in order to weaken the volatility of the data caused by complex factors. Next, the CatBoost model hyperparameters are optimized using the Grey Wolf Optimizer algorithm, while the coal price data is fed into the combined forecasting model. Lastly, the SHAP interpretation method is introduced for studying the important indicators affecting coal prices. The experimental results show that the combined CEEMDAN-GWO-CatBoost forecasting model proposed in this paper has significantly better forecasting performance than other comparative models, and the SHAP method employed in this study identifies the macroeconomic environment, freight costs, and coal import volume as significant factors affecting coal prices. As part of the contribution of this paper, specific recommendations are made to the government regarding the formulation of a regulatory policy for the coal industry in the context of carbon neutrality based on the findings of this research.
To address the difficulty of low prediction accuracy, insufficient model stability, and certain lag associated with a single machine learning model in the prediction of house price, this paper proposes a multimodel fusion house price prediction model based on stacking integrated learning. Firstly, web search data affecting house prices were collected by web crawler technology, and Spearman correlation analysis was performed on the attribute set to reduce its complexity and establish a prediction index system for four first-tier cities in China. Secondly, with the goal of improving accuracy, diversity, and generalization ability, the types of base learners as well as metalearners are determined, and the parameters of the base learners are optimized using the grey wolf optimization algorithm to produce the GWO-stacking model, and the experimental results from four datasets demonstrate that the model has high prediction accuracy. Finally, to solve the issue of unintelligible black boxes in machine learning models, we have used the state-of-the-art interpretation method SHAP combined with the LightGBM algorithm to interpret the model, and the result can be used as a basis for real estate policy planning and adjustment and even guide the demand of home buyers, thus improving the efficiency and effectiveness of government policy making.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.