Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
Constructing a differentially private (DP) estimator requires deriving the maximum influence of an observation, which can be difficult in the absence of exogenous bounds on the input data or the estimator, especially in high dimensional settings. This paper shows that standard notions of statistical depth, i.e., halfspace depth and regression depth, are particularly advantageous in this regard, both in the sense that the maximum influence of a single observation is easy to analyze and that this value is typically low. This is used to motivate new approximate DP location and regression estimators using the maximizers of these two notions of statistical depth. A more computationally efficient variant of the approximate DP regression estimator is also provided. Also, to avoid requiring that users specify a priori bounds on the estimates and/or the observations, variants of these DP mechanisms are described that satisfy random differential privacy (RDP), which is a relaxation of differential privacy provided by Hall, Wasserman, and Rinaldo (2013). We also provide simulations of the two DP regression methods proposed here. The proposed estimators appear to perform favorably relative to the existing DP regression methods we consider in these simulations when either the sample size is at least 100-200 or the privacy-loss budget is sufficiently high.
No abstract
We propose probability and density forecast combination methods that are defined using the entropy regularized Wasserstein distance. First, we provide a theoretical characterization of the combined density forecast based on the regularized Wasserstein distance under the assumption. More specifically, we show that the regularized Wasserstein barycenter between multivariate Gaussian input densities is multivariate Gaussian, and provide a simple way to compute mean and its variance–covariance matrix. Second, we show how this type of regularization can improve the predictive power of the resulting combined density. Third, we provide a method for choosing the tuning parameter that governs the strength of regularization. Lastly, we apply our proposed method to the U.S. inflation rate density forecasting, and illustrate how the entropy regularization can improve the quality of predictive density relative to its unregularized counterpart.
The Census TopDown Algorithm (TDA) is a disclosure avoidance system using differential privacy for privacy-loss accounting. The algorithm ingests the final, edited version of the 2020 Census data and the final tabulation geographic definitions. The algorithm then creates noisy versions of key queries on the data, referred to as measurements, using zero-Concentrated Differential Privacy. Another key aspect of the TDA are invariants, statistics that the Census Bureau has determined, as matter of policy, to exclude from the privacy-loss accounting. The TDA post-processes the measurements together with the invariants to produce a Microdata Detail File (MDF) that contains one record for each person and one record for each housing unit enumerated in the 2020 Census. The MDF is passed to the 2020 Census tabulation system to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File. This paper describes the mathematics and testing of the TDA for this purpose.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.