Ozone is one of the most important air pollutants, with significant impacts on human health, regional air quality and ecosystems. In this study, we use geographic information and environmental information of the monitoring site of 5577 regions in the world from 2010 to 2014 as feature input to predict the long-term average ozone concentration of the site. A Bayesian optimization-based XGBoost-RFE feature selection model BO-XGBoost-RFE is proposed, and a variety of machine learning algorithms are used to predict ozone concentration based on the optimal feature subset. Since the selection of the underlying model hyperparameters is involved in the recursive feature selection process, different hyperparameter combinations will lead to differences in the feature subsets selected by the model, so that the feature subsets obtained by the model may not be optimal solutions. We combine the Bayesian optimization algorithm to adjust the parameters of recursive feature elimination based on XGBoost to obtain the optimal parameter combination and the optimal feature subset under the parameter combination. Experiments on long-term ozone concentration prediction on a global scale show that the prediction accuracy of the model after Bayesian optimized XGBoost-RFE feature selection is higher than that based on all features and on feature selection with Pearson correlation. Among the four prediction models, random forest obtained the highest prediction accuracy. The XGBoost prediction model achieved the greatest improvement in accuracy.
PM2.5 is one of the main pollutants that cause air pollution, and high concentrations of PM2.5 seriously threaten human health. Therefore, an accurate prediction of PM2.5 concentration has great practical significance for air quality detection, air pollution restoration, and human health. This paper uses the historical air quality concentration data and meteorological data of the Beijing Olympic Sports Center as the research object. This paper establishes a long short-term memory (LSTM) model with a time window size of 12, establishes a T-shape light gradient boosting machine (TSLightGBM) model that uses all information in the time window as the next period of prediction input, and establishes a LSTM-TSLightGBM model pair based on an optimal weighted combination method. PM2.5 hourly concentration is predicted. The prediction results on the test set show that the mean squared error (MAE), root mean squared error (RMSE), and symmetric mean absolute percentage error (SMAPE) of the LSTM-TSLightGBM model are 11.873, 22.516, and 19.540%, respectively. Compared with LSTM, TSLightGBM, the recurrent neural network (RNN), and other models, the LSTM-TSLightGBM model has a lower MAE, RMSE, and SMAPE, and higher prediction accuracy for PM2.5 and better goodness-of-fit.
Sentiment analysis of netizens’ comments can accurately grasp the psychology of netizens and reduce the risks brought by online public opinion. However, there is currently no effective method to solve the problems of short text, open word range, and sometimes reversed word order in comments. To better solve the above problems, this article proposes a hybrid model of sentiment classification, which is based on bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM) and a text convolution neural network (TextCNN) (BERT-BiLSTM-TextCNN). The experimental results show that (1) the hybrid model proposed in this article can better combine the advantages of BiLSTM and TextCNN; it not only captures local correlation while retaining context information but also has high accuracy and stability. (2) The BERT-BiLSTM-TextCNN model can extract important emotional information more flexibly in text and achieve multiclass classification tasks of emotions more accurately. The innovations of this study are as follows: (1) the use of BERT to generate word vectors has the advantages of more prior information and a full combination of contextual semantics; (2) the BiLSTM model, as a bidirectional context mechanism model, can obtain contextual information well; and (3) the TextCNN model can obtain important features well in the problem of text classification, and the combined effect of the three modules can significantly improve the accuracy of emotional multilabel classification.
The concentration series of PM2.5 (particulate matter ≤ 2.5 μm) is nonlinear, nonstationary, and noisy, making it difficult to predict accurately. This paper presents a new PM2.5 concentration prediction method based on a hybrid model of complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and bi-directional long short-term memory (BiLSTM). The new method was applied to predict the same kind of particulate pollutant PM10 and heterogeneous gas pollutant O3, proving that the prediction method has strong generalization ability. First, CEEMDAN was used to decompose PM2.5 concentrations at different frequencies. Then, the fuzzy entropy (FE) value of each decomposed wave was calculated, and the near waves were combined by K-means clustering to generate the input sequence. Finally, the combined sequences were put into the BiLSTM model with multiple hidden layers for training. We predicted the PM2.5 concentrations of Seoul Station 116 by the hour, with values of the root mean square error (RMSE), the mean absolute error (MAE), and the symmetric mean absolute percentage error (SMAPE) as low as 2.74, 1.90, and 13.59%, respectively, and an R2 value as high as 96.34%. The “CEEMDAN-FE” decomposition-merging technology proposed in this paper can effectively reduce the instability and high volatility of the original data, overcome data noise, and significantly improve the model’s performance in predicting the real-time concentrations of PM2.5.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.