Analysis of large volumes of data is very complex due to not only a high level of skewness and heteroscedasticity of variance but also the phenomenon of missing data. Expectile regression is a popular alternative method of analyzing heterogeneous data. In this paper, we consider fitting a linear expectile regression model for estimating conditional expectiles based on a large quantity of data with covariates missing at random. We construct a communication-efficient surrogate loss (CSL) function to estimate model parameters. The asymptotic normality of the proposed estimator is established. A proximal alternating direction method of multipliers (ADMM) algorithm is developed for distributed statistical optimization on a large quantity of data. Simulation studies are performed to assess the finite-sample performance of the proposed method. Survey data from the Behavioral Risk Factor Surveillance System (BRFSS) is used to demonstrate the utility of the proposed method in practice. INDEX TERMS CSL function, expectile regression, large-scale data, missing at random, proximal ADMM algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.