The purpose of this paper is a comparative study of a non-exhaustive, though representative, set of methodologies already available for the partition of the training dataset in time series prediction, and also for variable selection under the wrapper paradigm. The partition policy of the training dataset and the choice of a proper set of variables for the regression vector are known to have a significant influence in the accuracy of the predictor, no matter the choice of the prediction model. However, there has been no extensive search for a figure of merit supporting a comparative analysis. Here, two partition policies, denoted sequential and random, are compared, and among the variable selection approaches using wrappers, forward selection is contrasted with sensitivity based pruning. Five real financial time series with trends and seasonality have been considered and multilayer perceptrons are adopted as the predictor. The obtained results indicate with high confidence that the rarely adopted random partition and the computationally intensive forward selection overcomes the contestants in the whole set of experiments.
I. INTR O D U C T I ONArtificial Neural Networks (short, ANNs) have been successfully applied to time series prediction ([3][19] [22][23]). The great interest in ANNs reflects their potential to alleviate problems associated with traditional linear forecasting techniques, as Box and Jenkins approaches [1], and also the possibility of interpreting the prediction task as a particular instance of learning from data, a well-known research area in machine learning [2]. Seasonality in the time series is a periodic and recursive pattern produced by some highly influential factors, as climate conditions and other periodical events with economical impact, taking financial time series into account. Apart from seasonality, it may present other nonstationary behavior, denoted trends. A wide range of prediction models are applied just after all nonstationary effects have been extracted from the time series under consideration. Zhang and Qi [26] have found that ANNs are not able to effectively capture seasonal or trend variations with the unpreprocessed raw data and either detrend or deseasonalization can dramatically reduce the forecasting error. They have also indicated that simultaneously combining detrend and deseasonalization can be considered the most effective data preprocessing approach. However, other scientific papers in the literature [5][13][16][21][23][9] have indicated the opposite, i.e. ANN models perform better when the time series contains unknown seasonal and/or trend components. Sharda and Patil [21] examined 88 seasonal time series from the M-competition [17] and they have found that ANNs can effectively model seasonality, so that extracting this behavior is not necessary. Franses and Draisma [5] have found that ANNs can also detect possibly changing seasonal patterns.The absence of consensus concerning the benefits associated with the preprocessing phase, though restricted to ANN models as...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.