“…List of Symbols and Abbreviations: a l , the output of the l-th layer; W l , the weight matrix connecting the (l-1)-th and l-th layers; b l , is the bias of the l-th layer; s(x), the activation function; m, the number of samples; dθ(k), is the gradient of θ at the k-th update; θ (k -1) , is the coefficient before the k-th update and r θ (k − 1) , O is the partial derivative of the θ' s objective function at the k-th update; r (k) , the moving average of the squared gradient of the parameters at the k-th update; ρ, the decay rate; Δθ(k), is the change amount of parameter at the k-th update; α (k) , is the learning rate at the k-th update; θ (k) , a parameter after the k-th update; ε, the decay rate of the learning rate; ω s , the weight coefficients corresponding to the s-th sample; Δθ (k)s , the variation of parameters with sample weights of the s-th sample; Δθ (k)w , the variation of parameters with sample weights at the k-th update. average, 9,10 wavelet analysis, 11,12 and so on; intelligent methods, such as support vector machine (SVM), 13,14 random forest, [15][16][17] neural networks, [18][19][20] and other intelligent forecasting methods. Other methods, such as hybrid technology, [21][22][23][24] are a combination of more than one technology, that is, a combination of traditional and intelligent method technologies or a combination of different intelligent methods.…”