eForecaster: Unifying Electricity Forecasting with Robust, Flexible, and Explainable Machine Learning Algorithms

Zhu, Zhaoyang; Chen, Weiqi; Xia, Rui; Zhou, Tian; Niu, Peisong; Peng, Bingqing; Wang, Wenwei; Liu, Hengbo; Ma, Ziqing; Wen, Quan; Sun, Liang

doi:10.1609/aaai.v37i13.26853

Cited by 1 publication

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LogSparse Transformer [23] and Informer [15] discover the sparsity of query-key matching matrix and they force the elements of query to attach to the partial elements of key for the sake of reducing the complexity. Autoformer [24], FEDformer [25] and ETSformer [26] combine the TSFT with seasonal-trend decomposition and signal processing method, e.g., Fourier Analysis, in attention mechanism to enhance their interpretability. Patch-wise attention is more popular and proven to be more useful recently.…”

Section: Related Workmentioning

confidence: 99%

“…Due to their simple architectures, it is convenient for them to combine with statistics models for the objective of improving their interpretability and forecasting capability. NBEATS [29] and DLinear [5] adopt seasonal-trend decomposition methods in their networks more concisely than FEDformer [25] but achieve better results in general. C. Challu et al [7] further presented N-HiTS that employs sampling and interpolation strategies on the basis of NBEATS for more precise and hierarchical prediction.…”

Section: Related Workmentioning

confidence: 99%

“…This problem is more noteworthy when it comes to some modified TSFT with hierarchical architecture in encoder. For instance, Informer [15] employs convolution layers between every two stages in encoder and FEDformer [25] keeps decomposing input sequences to acquire more precise seasonal features but neither of them apply the same operations to decoder and only the feature map of the last stage in encoder is sent to decoder. Crossformer [8] merges adjacent segments to obtain bigger patches in deeper stages in both encoder and decoder.…”

Section: A Analysis Of Decoder In Tsftmentioning

confidence: 99%

“…To handle the first redundant self-attention problem in decoder, we change the order of the self-attention and cross-attention in decoder. Thereby, before embarking upon deducing any relations within unknown prediction sequence, the prediction sequence receives the auto-regressive parts from the deepest encoder feature map, which serves as a better role for prediction sequence initialization before the first self-attention in decoder than simple zero-initialization with start token [15], randomly generated parameters [8], the trend decomposition of raw input sequence [25], and so forth. It is evident that the latter initialization formats of other TSFTs are either relatively simple or inefficient.…”

Section: B Model Architecturementioning

confidence: 99%

See 3 more Smart Citations