2018
DOI: 10.48550/arxiv.1803.05874
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Synthesizing geocodes to facilitate access to detailed geographical information in large scale administrative data

Abstract: In this paper we investigate if generating synthetic data can be a viable strategy for providing access to detailed geocoding information for external researchers without compromising the confidentiality of the units included in the database. This research was motivated by a recent project at the Institute for Employment Research (IAB) in Germany that linked exact geocodes to the Integrated Employment Biographies, a large administrative database containing several million records. Based on these data we evalua… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4

Relationship

4
0

Authors

Journals

citations
Cited by 4 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…One stream of work has treated the geographic location as variable(s) carrying little geographic information, therefore their proposed synthesizers do not incorporate spatial modeling. Wang and Reiter (2012); Drechsler and Hu (2018+) developed CART models (Reiter, 2005c) to synthesize continuous longitude and latitude. In addition, Drechsler and Hu (2018+) combined the continuous longitude and latitude variables into a single categorical geographic variable, and used versions of categorical CART models for its synthesis.…”
Section: Synthesis Of Locationsmentioning
confidence: 99%
See 4 more Smart Citations
“…One stream of work has treated the geographic location as variable(s) carrying little geographic information, therefore their proposed synthesizers do not incorporate spatial modeling. Wang and Reiter (2012); Drechsler and Hu (2018+) developed CART models (Reiter, 2005c) to synthesize continuous longitude and latitude. In addition, Drechsler and Hu (2018+) combined the continuous longitude and latitude variables into a single categorical geographic variable, and used versions of categorical CART models for its synthesis.…”
Section: Synthesis Of Locationsmentioning
confidence: 99%
“…In addition to the expected match risks, measures such as the true match rate (the percentage of true unique matches among target records) and the false match rate (the percentage of false matches among unique matches) are also useful (Reiter and Mitra, 2009;Drechsler and Reiter, 2010;Hu and Hoshino, 2018;Hu, 2018+;Drechsler and Hu, 2018+).…”
Section: Disclosure Risksmentioning
confidence: 99%
See 3 more Smart Citations