“…Gradient descent algorithms are commonly used to train neural network models online by constantly tracking a specified performance measure such as prediction error, and are also applied to traditional statistical methods such as the autoregressive integrated moving average models [8,16]. The learning rate can be adapted to achieve better convergence, avoid overfitting to noisy samples or to automatically adjust to shifts in data distribution [5,14,17,18,19]. RM-Sprop [20] is a popular algorithm that scales the learning rate by moving average of squared gradients based on the intuition that the magnitude of each weight update should be similar regardless of the actual gradient magnitude.…”