We propose two semiparametric model averaging schemes for nonlinear dynamic time series regression models with a very large number of covariates including exogenous regressors and autoregressive lags. Our objective is to obtain more accurate estimates and forecasts of time series by using a large number of conditioning variables in a nonparametric way. In the first scheme, we introduce a Kernel Sure Independence Screening (KSIS) technique to screen out the regressors whose marginal regression (or auto-regression) functions do not make a significant contribution to estimating the joint multivariate regression function; we then propose a semiparametric penalized method of Model Averaging MArginal Regression (MAMAR) for the regressors and auto-regressors that survive the screening procedure, to further select the regressors that have significant effects on estimating the multivariate regression function and predicting the future values of the response variable. In the second scheme, we impose an approximate factor modelling structure on the ultra-high dimensional exogenous regressors and use the principal component analysis to estimate the latent common factors; we then apply the penalized MAMAR method to select the estimated common factors and the lags of the response variable that are significant. In each of the two schemes, we construct the optimal combination of the significant marginal regression and auto-regression functions. Asymptotic properties for these two schemes are derived under some regularity conditions. Numerical studies including both simulation and an empirical application to forecasting inflation are given to illustrate the proposed methodology.
Ex-post harmonisation is one of many data preprocessing processes used to combine the increasingly vast and diverse sources of data available for research and analysis. Documenting provenance and ensuring the quality of multi-source datasets is vital for ensuring trustworthy scientific research and encouraging reuse of existing harmonisation efforts. However, capturing and communicating statistically relevant properties of harmonised datasets is difficult without a universal standard for describing harmonisation operations. Our paper combines mathematical and computer science perspectives to address this need. The Crossmaps Framework defines a new approach for transforming existing variables collected under a specific measurement or classification standard to an imputed counterfactual variable indexed by some target standard. It uses computational graphs to separate intended transformation logic from actual data transformations, and avoid the risk of syntactically valid data manipulation scripts resulting in statistically questionable data. In this paper, we introduce the Crossmaps Framework through the example of ex-post harmonisation of aggregated statistics in the social sciences. We define a new provenance task abstraction, the crossmap transform, and formalise two associated objects, the shared mass array and the crossmap. We further define graph, matrix and list encodings of crossmaps and discuss resulting implications for understanding statistical properties of ex-post harmonisation and designing error minimising workflows.
We propose an asset pricing factor model constructed with semi-parametric characteristics-based mispricing and factor loading functions. This model captures common movements of stock excess returns and includes a two-layer network of arbitrage returns interconnected by security-specific characteristics. We approximate the unknown functions by B-splines where the number of B-splines coefficients is diverging. We estimate this model and test the existence of the mispricing function by a power enhanced hypothesis test. The enhanced test solves the low power problem caused by diverging B-spline coefficients. Meanwhile, the strengthened power approaches to one asymptotically. And the dynamic networks are explored through Hierarchical K-Means Clusterings from detected mispricing functions. We apply our methodology to CRSP monthly data for the US stock market with one-year rolling windows during 1967-2017. This empirical study shows the presence of mispricing functions in certain time blocks and a dynamic network structure of arbitrage returns through groups of some characteristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.