We consider the problem of one-step-ahead prediction of a real-valued, stationary, strongly mixing random process fX i g 1 i=01. The best mean-square predictor of X 0 is its conditional mean given the entire infinite past fX i g 01 i=01. Given a sequence of observations X 1 X 2 1 1 1 X N , we propose estimators for the conditional mean based on sequences of parametric models of increasing memory and of increasing dimension, for example, neural networks and Legendre polynomials. The proposed estimators select both the model memory and the model dimension, in a data-driven fashion, by minimizing certain complexity regularized least squares criteria. When the underlying predictor function has a finite memory, we establish that the proposed estimators are memory-universal: the proposed estimators, which do not know the true memory, deliver the same statistical performance (rates of integrated mean-squared error) as that delivered by estimators that know the true memory. Furthermore, when the underlying predictor function does not have a finite memory, we establish that the estimator based on Legendre polynomials is consistent.