Summary
Data arising from sample surveys are usually the result of complex survey design involving such techniques as stratification, multistage selection and the use of auxiliary information through unequal probability selection. These resulting data are often analysed using regression techniques without further regard to the sample design. This paper shows that, in general, ordinary least squares (OLS) regression will be biased in this situation even for large samples, although in the important class of most epsem designs asymptotic unbiasedness of least squares estimation is preserved. Alternative estimators are considered which yield unbiased estimates of the simple regression coefficient. Variances of all the estimators are derived and comparisons made. The usual OLS estimator of variance is also examined and found to be biased in general, even when used in conjunction with an epsem design for which the OLS estimator itself is unbiased.
For the case of different regression relationships in different subgroups of a finite population, only part of which are sampled, the extended least squares estimator of any weighted average of the distinct coefficients is derived, under assumptions relating only to the first two moments of the distribution of the coefficients. Under these assumptions, the estimator is shown to be the best linear [-unbiased estimator, while under further distributional assumptions, it is also the Bayesian estimator for a quadratic loss function. For the case of unknown variances a method for estimating them from the sample is proposed. The empirical estimator thus obtained is shown to perform well by a simulation comparison with the optimal estimator and with other proposed empirical estimators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.