Population Reconstruction 2015
DOI: 10.1007/978-3-319-19884-2_7
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Source Entity Resolution for Genealogical Data

Abstract: n this chapter, we study the application of existing entity resolution (ER) techniques on a real-world multi-source genealogical dataset. Our goal is to identify all persons involved in various notary acts and link them to their birth, marriage, and death certificates. We analyze the influence of additional ER features, such as name popularity, geographical distance, and co-reference information on the overall ER performance. We study two prediction models: regression trees and logistic regression. In order to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 29 publications
0
15
0
Order By: Relevance
“…The entity resolution problem has been referred in the literature with multiple terms including deduplication, entity linkage, and entity matching. Entity resolution has been used in various fields such as matching profiles in social networks [2], bioinformatics data [3], biomedical data [41], publication data [5,6], genealogical data [4], product data [5,6], etc. The attributes of the entities are compared, and a similarity value is assigned.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The entity resolution problem has been referred in the literature with multiple terms including deduplication, entity linkage, and entity matching. Entity resolution has been used in various fields such as matching profiles in social networks [2], bioinformatics data [3], biomedical data [41], publication data [5,6], genealogical data [4], product data [5,6], etc. The attributes of the entities are compared, and a similarity value is assigned.…”
Section: Related Workmentioning
confidence: 99%
“…We will add two points in child[0], one coming from child [1] and one from child [3]. Similarly, child [1] and child [3] will receive a point from child [4]. Algorithm 1 details the procedure for retrieving the spatial blocks with QuadFlex.…”
Section: Spatial Blockingmentioning
confidence: 99%
“…Efremova et al [11] consider the problem of linking records from multiple genealogical datasets. They cast the linking problem into supervised binary classification tasks, similar to this work, and find name popularity, geographical distance, and co-reference information to be important features.…”
Section: Related Workmentioning
confidence: 99%
“…Record linkage for genealogical data has been previously studied, e.g., by Efremova et al [4] and by Christen et al [3]. Alternative family tree visualization approaches have been proposed by McGuffin and Balakrishnan [9] and by Bezerianos et al [1].…”
Section: Related Workmentioning
confidence: 99%