Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples.
Abstract. -We developed in this paper a method to predict time series with non-linear tools. The specificity of the method is to use as much information as possible as input to the model (many past values of the series, many exogenous variables), to compress this information (by a non-linear method) in order to obtain a state vector of limited size, facilitating the subsequent regression and the generalization ability of the forecasting algorithm and to fit a non-linear regressor (here a RBF neural network) on the reduced vectors. We show that this method is able to find non-linear relationships in artificial and real-world financial series. On a difficult task, which consists in forecasting the tendency of the Bel 20 stock market index, we show that this method improves the results compared both to linear models and to non-linear ones where the non-linear compression is not used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.