Abstract-A new global nonlinear predictor with a particle swarm optimized interval support vector regression (PSO-ISVR) is proposed to address three issues (viz. kernel selection, model optimization, kernel method speed) encountered when applying support vector regression (SVR) in the presence of large datasets. The novel prediction model can reduce the SVR computing overhead by dividing input space and adaptively selecting the optimized kernel functions to obtain optimal SVR parameter by PSO. To quantify the quality of the predictor, its generalization performance and execution speed are investigated based on statistical learning theory. In addition, experiments using synthetic data as well as the stock volume weighted average price (VWAP) are reported to demonstrate the effectiveness of the developed models. The experimental results show that the proposed PSO-ISVR predictor can improve the computational efficiency and the overall prediction accuracy compared with the results produced by the SVR and other regression methods. The proposed PSO-ISVR provides an important tool for nonlinear regression analysis of big data.Index Terms-global nonlinear predictor, interval support vector regression, particle swarm optimization, kernel function, sliding adaptive model, large data
I. INTRODUCTIONUPPORT vector regression (SVR) model is constructed based on statistical learning theory [1], which uses a kernel function to map the data from some input space to a highdimensional feature space where the problem becomes amenable for handling by linear regression [2]. Owing to its robustness to noise and its generalization abilities, it has been widely employed in various areas such as adaptive flight identification [3], ore grade estimation [4] , and stock market price forecasting [5][6][7].Many researchers have pointed out that three crucial problems existing in SVR urgently need to be addressed: (1) How to choose or construct an appropriate kernel to complete forecasting problems [8,9]; (2) How to optimize parameters of SVR to improve the quality of prediction [10,11]; (3) How to construct a fast algorithm to operate in presence of large datasets [12,13]. With unsuitable kernel functions or hyperparameter settings, SVR may lead to poor prediction results. In fact, a kernel function forms a certain nonlinear transformation function. Due to data uncertainty in practical regression problems, it is difficult to determine which kernel function is the best one for a specific problem without any prior knowledge [14]. If the adjustable kernel parameters in SVR are not properly selected, it will result in the undesirable phenomena of over-fitting or under-fitting [1]. Furthermore, SVR is typically confronted with a heavy computing overhead due to processing large Gram matrices being associated with the kernels [13]. This computing burden becomes an essential barrier when dealing with massive data, such as those encountered in protein structure prediction and time series prediction [15].During the past several years, various methods h...