2012
DOI: 10.1007/978-3-642-33627-0_11
|View full text |Cite
|
Sign up to set email alerts
|

Valid Statistical Inference on Automatically Matched Files

Abstract: Abstract. We develop a statistical process for determining a confidence set for an unknown bipartite matching. It requires only modest assumptions on the nature of the distribution of the data. The confidence set involves a set of linear constraints on the bipartite matching, which permits efficient analysis of the matched data, e.g., using linear regression, while maintaining the proper degree of uncertainty about the linkage itself.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2014
2014
2016
2016

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 8 publications
0
7
0
Order By: Relevance
“…entities in the merged database. The approach of PPRL reviewed in [10] sets out to deal with this problem. Merging data from multiple files with the same or similar values without releasing their attributes is what PPRL hopes to achieve.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…entities in the merged database. The approach of PPRL reviewed in [10] sets out to deal with this problem. Merging data from multiple files with the same or similar values without releasing their attributes is what PPRL hopes to achieve.…”
Section: Discussionmentioning
confidence: 99%
“…Our implementations of [16]'s canopies approach and [21]'s nearest neighbor approach perform poorly on the RLdata10000 and "noisy" datasets 10 . Figure 1 gives results of these approaches for different threshold parameters (t is the threshold parameter for sorted TNN) for the RLdata10000 dataset.…”
Section: Clustering Approachesmentioning
confidence: 97%
See 1 more Smart Citation
“…Copas and Hilton (1990) describe the idea of modeling the distortion process using what they call the "Hit-Miss Model," which anticipates part of our model in Section 3.1. The specific distortion model we use is, however, closer to that introduced in Hall and Fienberg (2012), as part of a nonparametric frequentist technique for matching k = 2 files that allows for distorted data. Thus, their work is related to ours as we view the records as noisy, distorted entities, that we model using parameters and latent individuals.…”
Section: Related Workmentioning
confidence: 99%
“…Hall and Fienberg reported a method to build bipartite graphs and evaluate the confidence of different hypothetical record link assignments [12]. This method can be used to link datasets of moderate size.…”
Section: Introduction and Related Workmentioning
confidence: 99%