Featured Application: This work can be used in the intelligent system for the smart city, smart agriculture, etc.Abstract: It is crucial to predict PM2.5 concentration for early warning regarding and the control of air pollution. However, accurate PM2.5 prediction has been challenging, especially in long-term prediction. PM2.5 monitoring data comprise a complex time series that contains multiple components with different characteristics; therefore, it is difficult to obtain an accurate prediction by a single model. In this study, an integrated predictor is proposed, in which the original data are decomposed into three components, that is, trend, period, and residual components, and then different sub-predictors including autoregressive integrated moving average (ARIMA) and two gated recurrent units are used to separately predict the different components. Finally, all the predictions from the sub-predictors are combined in fusion node to obtain the final prediction for the original data. The results of predicting the PM2.5 time series for Beijing, China showed that the proposed predictor can effectively improve prediction accuracy for long-term prediction. Appl. Sci. 2019, 9, 4533 2 of 14 models [14] and vector auto-regression (VAR) [15]. ARIMA is a linear modeling method that provides accurate predictions for approximate linear relationships. However, its prediction performance is not good enough for nonlinear prediction problems.To capture the complex nonlinearity of PM2.5, a back propagation (BP) neural network has been widely applied in time-series data prediction [16][17][18]. Other shallow nonlinear networks based on a machine-learning mechanism such as the improved gray neural network model [19] and the radial basis function (RBF) neural network [20] have also been used to predict time-series data. Xu et al. [3] proposed a supplementary leaky integrator echo state network (SLI-ESN), which added the historical state term of the historical moment to the calculation of a leaky integrator reservoir. Compared with an echo state network (ESN), a leaky integrator ESN (LI-ESN), an extreme learning machine (ELM), a hierarchical ELM (H-ELM), a stacked auto-encoder (SAE), and a traditional SLI-ESN, the proposed SLI-ESN of Xu et al.[3] could achieve good prediction results, but its long-term predictions were not satisfactory.As a shallow network still cannot effectively extract the complex nonlinearity of the data, decomposition must be used; that is, the data must be decomposed into multiple components to reduce its complexity, and then multiple sub-models are used to improve the prediction performance. For example, García et al. [21] decomposed a long-term data series into smaller seasonal component patterns. Jesús et al. [22] divided a pollen-concentration data series into seasonal and random parts, used partial least-squares regression (PLSR) to fit the residuals, and established an airborne pollen time-series model to predict the daily pollen concentration. Ming et al.[23] extracted accurate seasonal signals, a...