Forecasting groups of time series is of increasing practical importance. Some examples are: forecasting the demand for multiple products offered by a retailer, server loads within a data center or the number of completed ride shares in zones within a city. The local approach to this problem considers each time series separately and fits a function or model to each series. The global approach considers all time series as the same regression task and fits a single function to all series. For groups of similar time series, global methods outperform the more established local methods. However, there is recent empirical evidence showing surprisingly good performance of global models on heterogeneous groups of time series. This suggests a more general applicability of global methods, with major implications in forecasting theory and practice, in the form of more accurate tools for automatic forecasting and new scenarios to study. However, the evidence has been of empirical nature and a more fundamental study is required. In this paper, we formalize the setting of forecasting a set of time series with local and global learning algorithms, leading to the following contributions:• We show that global methods are not more restrictive than local methods for time series forecasting, a result which does not apply to sets of regression problems in general. Global and local methods can produce the same forecasts without any assumptions about similarity of the series in the set. This result shows that global models can succeed in a wider range of problems than previously thought. • We derive basic generalization bounds for local and global algorithms. We find that the complexity of local methods grows with the size of the set while it remains constant for global methods. Therefore a global algorithm can afford to be quite complex and still benefit from better generalization error than local methods for large datasets. These bounds serve to clarify and support recent experimental results in the area of time series forecasting, and guide the design of new algorithms. For the specific class of limited-memory autoregressive models, this bound leads to the design of global models with much larger memory than what is effective for local methods. • The findings are supported by an extensive empirical study. We show that purposely naïve algorithms derived from these principles, such as global linear models fit by least squares, deep networks or even high order polynomials, result in superior accuracy in benchmark datasets. In particular, global linear models show an unreasonable effectiveness, providing competitive forecasting accuracy with far fewer parameters than the simplest of local methods. Empirical evidence points towards global models being able to automatically learn long memory patterns and related effects that are only available to local models if introduced manually.
Keywords Time SeriesConsider the problem of having to forecast many time series as a group. We might need to forecast tourist arrivals at all our resorts for n...