The quantity and quality of administrative information available to National Statistical Institutes have been constantly increasing over the past several years. However, different sources of administrative data are not expected to each have the same population coverage, so that estimating the true population size from the collective set of data poses several methodological challenges that set the problem apart from a classical capture-recapture setting. In this article, we consider two specific aspects of this problem: (1) misclassification of the units, leading to lists with both overcoverage and undercoverage; and (2) lists focusing on a specific subpopulation, leaving a proportion of the population with null probability of being captured. We propose an approach to this problem that employs a class of capturerecapture methods based on Latent Class models. We assess the proposed approach via a simulation study, then apply the method to five sources of empirical data to estimate the number of active local units of Italian enterprises in 2011.
Bayesian networks are particularly useful for dealing with high dimensional statistical problems. They allow a reduction in the complexity of the phenomenon under study by representing joint relationships between a set of variables through conditional relationships between subsets of these variables. Following Thibaudeau and Winkler we use Bayesian networks for imputing missing values. This method is introduced to deal with the problem of the consistency of imputed values: preservation of statistical relationships between variables ("statistical consistency") and preservation of logical constraints in data ("logical consistency"). We perform some experiments on a subset of anonymous individual records from the 1991 UK population census. Copyright 2004 Royal Statistical Society.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.