A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R) of 0.90 and a root mean squared error (RMSE) of 23.69μg/m. For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model.
Air pollution has become one of the most serious environmental problems in the world. Considering Beijing and six surrounding cities as main research areas, this study takes the daily average pollutant concentrations and meteorological factors from 2 December 2013 to 13 October 2017 into account and studies the spatial and temporal distribution characteristics and the relevant relationship of particulate matter smaller than 2.5 μm (PM2.5) concentrations in Beijing. Based on correlation analysis and geo-statistics techniques, the inter-annual, seasonal, and diurnal variation trends and temporal spatial distribution characteristics of PM2.5 concentration in Beijing are studied. The study results demonstrate that the pollutant concentrations in Beijing exhibit obvious seasonal and cyclical fluctuation patterns. Air pollution is more serious in winter and spring and slightly better in summer and autumn, with the spatial distribution of pollutants fluctuating dramatically in different seasons. The pollution in southern Beijing areas is more serious and the air quality in northern areas is better in general. The diurnal variation of air quality shows a typical seasonal difference and the daily variation of PM2.5 concentrations present a “W” type of mode with twin peaks. Besides emission and accumulation of local pollutants, air quality is easily affected by the transport effect from the southwest. The PM2.5 and PM10 concentrations measured from the city of Langfang are taken as the most important factors of surrounding pollution factors to PM2.5 in Beijing. The concentrations of PM10 and carbon monoxide (CO) concentrations in Beijing are the most significant local influencing factors to PM2.5 in Beijing. Extreme wind speeds and maximal wind speeds are considered to be the most significant meteorological factors affecting the transport of pollutants across the region. When the wind direction is weak southwest wind, the probability of air pollution is greater and when the wind direction is north, the air quality is generally better.
It's natural these days for people to know the local events from massive documents. Many texts contain location information, such as city name or road name, which is always incomplete or latent. It's significant to extract the administrative area of the text and organize the hierarchy of area, called location normalization. Existing detecting location systems either exclude hierarchical normalization or present only a few specific regions. We propose a system named ROIBase 1 that normalizes the text by the Chinese hierarchical administrative divisions. ROIBase adopts a cooccurrence constraint as the basic framework to score the hit of the administrative area, achieves the inference by special embeddings, and expands the recall by the ROI (region of interest). It has high efficiency and interpretability because it mainly establishes on the definite knowledge and has less complex logic than the supervised models. We demonstrate that ROIBase achieves better performance against feasible solutions and is useful as a strong support system for location normalization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.