Complex and massive datasets can be easily accessed using the newly developed data acquisition technology. In spite of the fact that the smoothing spline ANOVA models have proven to be useful in a variety of fields, these datasets impose the challenges on the applications of the models. In this chapter, we present a selected review of the smoothing spline ANOVA models and highlight some challenges and opportunities in massive datasets. We review two approaches to significantly reduce the computational costs of fitting the model. One real case study is used to illustrate the performance of the reviewed methods.
Hyperparameter plays an essential role in the fitting of supervised machine learning algorithms. However, it is computationally expensive to tune all the tunable hyperparameters simultaneously especially for large data sets. In this paper, we give a definition of hyperparameter importance that can be estimated by subsampling procedures. According to the importance, hyperparameters can then be tuned on the entire data set more efficiently. We show theoretically that the proposed importance on subsets of data is consistent with the one on the population data under weak conditions. Numerical experiments show that the proposed importance is consistent and can save a lot of computational resources.
We consider a functional regression model in the framework of reproducing kernel Hilbert spaces where the interaction effect of two functional predictors, as well as their main effects, over the functional response is of interest. The regression component of our model is expressed by one trivariate coefficient function whose functional ANOVA decomposition yields the main and interaction effects. The trivariate coefficient function is estimated through the optimization of a penalized least squares objective with a roughness penalty on the function estimate. The estimation procedure can be easily implemented via standard numerical tools. Asymptotic results for the proposed model with or without functional measurement errors are established under the reproducing kernel Hilbert space framework. Extensive numerical studies show the advantages of the proposed method over existing ones in terms of prediction and estimation of coefficient functions. An application to the motivating example on histone modifications and gene expressions of a liver cancer cell line further demonstrates the better prediction accuracy of the proposed method against the competitors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.