Achieving data and information dissemination without harming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.Keywords public use files, disclosure avoidance, reidentification, de-identification, data utility 2 SAGE Open inferential disclosure (i.e., information that can be inferred about a record in a data set with better accuracy). There is significant literature on each of these topics, which are beyond the scope of this article.Our article is divided into six sections, of which this "Introduction" is the first. The second section presents "The Policy and Academic Context" surrounding the discussion. The third section discusses the state of the art in "De-Identification Methods," while the fourth emphasizes the state of the art in "Reidentification Methods." The fifth section presents the conclusions from the literature on the different ways in which users may "Access" public data, stressing the trade-offs between (a) confidentiality and utility and (b) confidentiality and ease of access. The last section presents the "Conclusion."
The Policy and Academic Context Historic PerspectiveConcerns about privacy and confidentiality in governmental efforts to collect and disseminate information are not new. As a review by Anderson and Seltzer (2009) suggests, "the roots of the modern concept of federal statistical confidentiality can be traced directly back to the late nineteenth century" (p. 8). Notwithstanding this history, the literature on statistical disclosure methods is fairly recent by modern standards (Dalenius, 1977, is considered the seminal paper). In 1975, the U.S. Federal Committee on Statistical Methodology (FCSM) was organized by the Office of Management and Budget (OMB) to investigate issues of data quality affecting federal statistics. As part of this effort, the Subcommittee on Disclosure Limitation Methodology, created within the FCSM, published its 1994 Statistical Policy Working Paper 22 (SPWP22). This paper, which was revised in 2005 by the Confidentiality and Data Access Committee (CDAC, 2005), sets good practice guidelines and recommendations for all agencies regarding confidentiality protection.
Defining Confidentiality ...