The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society.
Big data offers many opportunities for official statistics: for example increased resolution, better timeliness, and new statistical outputs. But there are also many challenges: uncontrolled changes in sources that threaten continuity, lack of identifiers that impedes linking to population frames, and data that refers only indirectly to phenomena of statistical interest. We discuss two approaches to deal with these challenges and opportunities. First, we may accept big data for what they are: an imperfect, yet timely, indicator of phenomena in society. These data exist and that's why they are interesting. Secondly, we may extend this approach by explicit modelling. New methods like machine-learning techniques can be considered alongside more traditional methods like Bayesian techniques. National statistical institutes have always been reluctant to use models, apart from specific cases like small-area estimates. Based on the experience at Statistics Netherlands we argue that NSIs should not be afraid to use models, provided that their use is documented and made transparent to users. Moreover, the primary purpose of an NSI is to describe society; we should refrain from making forecasts. The models used should therefore rely on actually observed data and they should be validated extensively.
In response to a changing environment, Statistics Netherlands has embarked on a large-scale redesign of the way statistics are produced. The aim is to increase the capability to respond to changing information demand, to lower the response burden for surveys, especially for businesses, and to improve efficiency, while preserving the overall quality level. The redesign is carried out within the framework of a so-called enterprise architecture, which gives overall guidance when structuring the processes of the organisation, including statistical methods and IT tools used. The article describes the redesign approach and explains the main features of the architecture. The emphasis is on experiences that may be relevant to other national statistical institutes operating in a similar environment.
Big data come in high volume, high velocity and high variety. Their high volume may lead to better accuracy and more details, their high velocity may lead to more frequent and more timely statistical estimates, and their high variety may give opportunities for statistics in new areas. But there are also many challenges: there are uncontrolled changes in sources that threaten continuity and comparability, and data that refer only indirectly to phenomena of statistical interest. Furthermore, big data may be highly volatile and selective: the coverage of the population to which they refer, may change from day to day, leading to inexplicable jumps in time-series. And very often, the individual observations in these big data sets lack variables that allow them to be linked to other datasets or population frames. This severely limits the possibilities for correction of selectivity and volatility. In this chapter, we describe and discuss opportunities for big data in official statistics.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with đŸ’™ for researchers
Part of the Research Solutions Family.