It is well known that AIC and BIC have different properties in model selection. BIC is consistent in the sense that if the true model is among the candidates, the probability of selecting the true model approaches 1. On the other hand, AIC is minimax-rate optimal for both parametric and nonparametric cases for estimating the regression function. There are several successful results on constructing new model selection criteria to share some strengths of AIC and BIC. However, we show that in a rigorous sense, even in the setting that the true model is included in the candidates, the above mentioned main strengths of AIC and BIC cannot be shared. That is, for any model selection criterion to be consistent, it must behave sup-optimally compared to AIC in terms of mean average squared error.
Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named adaptive regression by mixing (ARM), is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM algorithm assigns weights on the candidate modelsprocedures via proper assessment of performance of the estimators. The data are split into two parts, one for estimation and the other for measuring behavior in prediction. Although there are many plausible ways to assign the weights, ARM has a connection with information theory, which ensures the desired adaptation capability. Indeed, under mild conditions, we show that the squared L 2 risk of the estimator based on ARM is basically bounded above by the risk of each candidate procedure plus a small penalty term of order 1/n. Minimizing over the procedures gives the automatically optimal rate of convergence for ARM. Model selection often induces unnecessarily large variability in estimation. Alternatively, a proper weighting of the candidate models can be more stable, resulting in a smaller risk. Simulations suggest that ARM works better than model selection using Akaike or Bayesian information criteria when the error variance is not very small.
In the era of "big data", analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-ofthe-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.