Summary
Social and economic scientists are tempted to use emerging data sources like big data to compile information about finite populations as an alternative for traditional survey samples. These data sources generally cover an unknown part of the population of interest. Simply assuming that analyses made on these data are applicable to larger populations is wrong. The mere volume of data provides no guarantee for valid inference. Tackling this problem with methods originally developed for probability sampling is possible but shown here to be limited. A wider range of model‐based predictive inference methods proposed in the literature are reviewed and evaluated in a simulation study using real‐world data on annual mileages by vehicles. We propose to extend this predictive inference framework with machine learning methods for inference from samples that are generated through mechanisms other than random sampling from a target population. Describing economies and societies using sensor data, internet search data, social media and voluntary opt‐in panels is cost‐effective and timely compared with traditional surveys but requires an extended inference framework as proposed in this article.
Summary
Macroeconomic indicators about the labour force, published by national statistical institutes, are predominantly based on rotating panels. Sample sizes of most labour force surveys in combination with the design‐based or model‐assisted mode of inference obstruct the publication of such indicators on a monthly frequency. Previous research proposed a multivariate structural time series model to obtain more precise model‐based estimates by taking advantage of sample information observed in previous periods. In the paper this model is extended to use sample information from other domains or strongly correlated auxiliary series. A relatively parsimonious version of these models is currently used by Statistics Netherlands to produce official monthly figures about the labour force.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.