2019
DOI: 10.1609/aaai.v33i01.33014594
|View full text |Cite
|
Sign up to set email alerts
|

Cogra: Concept-Drift-Aware Stochastic Gradient Descent for Time-Series Forecasting

Abstract: We approach the time-series forecasting problem in the presence of concept drift by automatic learning rate tuning of stochastic gradient descent (SGD). The SGD-based approach is preferable to other concept drift algorithms in that it can be applied to any model and it can keep learning efficiently whilst predicting online. Among a number of SGD algorithms, the variance-based SGD (vSGD) can successfully handle concept drift by automatic learning rate tuning, which is reduced to an adaptive mean estimation prob… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 10 publications
0
20
0
Order By: Relevance
“…Training data was presented to the model in batches of 48 timesteps, which is representative of a real-life scenario in which 24 hours of data may be received at the end of each day's operation and used to continually update the model. This process would allow refinement of the model to variation in the fleet's behaviour, however, more substantial changes may require concept drift aware algorithms that automatically tune the learning rate to support more rapid adjustment of network parameters [38].…”
Section: Discussionmentioning
confidence: 99%
“…Training data was presented to the model in batches of 48 timesteps, which is representative of a real-life scenario in which 24 hours of data may be received at the end of each day's operation and used to continually update the model. This process would allow refinement of the model to variation in the fleet's behaviour, however, more substantial changes may require concept drift aware algorithms that automatically tune the learning rate to support more rapid adjustment of network parameters [38].…”
Section: Discussionmentioning
confidence: 99%
“…Gradient descent algorithms are commonly used to train neural network models online by constantly tracking a specified performance measure such as prediction error, and are also applied to traditional statistical methods such as the autoregressive integrated moving average models [8,16]. The learning rate can be adapted to achieve better convergence, avoid overfitting to noisy samples or to automatically adjust to shifts in data distribution [5,14,17,18,19]. RM-Sprop [20] is a popular algorithm that scales the learning rate by moving average of squared gradients based on the intuition that the magnitude of each weight update should be similar regardless of the actual gradient magnitude.…”
Section: Related Workmentioning
confidence: 99%
“…In POLA, instead of explicitly applying an inductive bias into learning rate scaling, we propose learning this scaling directly from the data itself. For adaptation in non-stationary environments, previous works apply simplifying assumptions on the loss function and Hessian computations [5,14], however secondorder derivative calculation can be expensive and the resulting learning rates are subject to instability if the approximated Hessian is ill-conditioned. Other works adapt the learning rate to reduce the adverse effect of outliers by monitoring distributional properties at neighboring data points [21,22].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations