Cogra: Concept-Drift-Aware Stochastic Gradient Descent for Time-Series Forecasting

Miyaguchi, Kohei; Kajino, Hiroshi

doi:10.1609/aaai.v33i01.33014594

Cited by 19 publications

(20 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Training data was presented to the model in batches of 48 timesteps, which is representative of a real-life scenario in which 24 hours of data may be received at the end of each day's operation and used to continually update the model. This process would allow refinement of the model to variation in the fleet's behaviour, however, more substantial changes may require concept drift aware algorithms that automatically tune the learning rate to support more rapid adjustment of network parameters [38].…”

Section: Discussionmentioning

confidence: 99%

We got the power: Predicting available capacity for vehicle-to-grid services using a deep recurrent neural network

Shipman

Roberts²,

Waldron

et al. 2021

Energy

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

We got the power: Predicting available capacity for vehicle-to-grid services using a deep recurrent neural network

Shipman

Roberts²,

Waldron

et al. 2021

Energy

View full text Add to dashboard Cite

“…Gradient descent algorithms are commonly used to train neural network models online by constantly tracking a specified performance measure such as prediction error, and are also applied to traditional statistical methods such as the autoregressive integrated moving average models [8,16]. The learning rate can be adapted to achieve better convergence, avoid overfitting to noisy samples or to automatically adjust to shifts in data distribution [5,14,17,18,19]. RM-Sprop [20] is a popular algorithm that scales the learning rate by moving average of squared gradients based on the intuition that the magnitude of each weight update should be similar regardless of the actual gradient magnitude.…”

Section: Related Workmentioning

confidence: 99%

“…In POLA, instead of explicitly applying an inductive bias into learning rate scaling, we propose learning this scaling directly from the data itself. For adaptation in non-stationary environments, previous works apply simplifying assumptions on the loss function and Hessian computations [5,14], however secondorder derivative calculation can be expensive and the resulting learning rates are subject to instability if the approximated Hessian is ill-conditioned. Other works adapt the learning rate to reduce the adverse effect of outliers by monitoring distributional properties at neighboring data points [21,22].…”

Section: Related Workmentioning

confidence: 99%

“…In this work, we focus on online time series prediction which is studied and applied in a wide range of domains such as in traffic monitoring, climate research and finance [5,6,7,8]. Predictions can be used for downstream tasks such as anomaly detection and to aid decision making [9,10].…”

Section: Introductionmentioning

confidence: 99%

“…Most works design algorithms for stationary environments with i.i.d. samples [4], and others that do consider time series prediction require Hessian computations [5,14] Fig. 1: Overview: POLA adapts the learning rate of SGD update through factor βt`1 P r0, 1s where the factor is meta-learned with the most recent data batch by assimilating online evaluation.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

POLA: Online Time Series Prediction by Adaptive Learning Rates

Zhang

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Online prediction for streaming time series data has practical use for many real-world applications where downstream decisions depend on accurate forecasts for the future. Deployment in dynamic environments requires models to adapt quickly to changing data distributions without overfitting. We propose POLA (Predicting Online by Learning rate Adaptation) to automatically regulate the learning rate of recurrent neural network models to adapt to changing time series patterns across time. POLA meta-learns the learning rate of the stochastic gradient descent (SGD) algorithm by assimilating the prequential or interleaved-test-then-train evaluation scheme for online prediction. We evaluate POLA on two real-world datasets across three commonly-used recurrent neural network models. POLA demonstrates overall comparable or better predictive performance over other online prediction methods.

show abstract