2020
DOI: 10.1007/978-3-030-49461-2_13
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Bootstrapping of Active Learning for Entity Resolution

Abstract: Entity resolution is one of the central challenges when integrating data from large numbers of data sources. Active learning for entity resolution aims to learn high-quality matching models while minimizing the human labeling effort by selecting only the most informative record pairs for labeling. Most active learning methods proposed so far, start with an empty set of labeled record pairs and iteratively improve the prediction quality of a classification model by asking for new labels. The absence of adequate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…The latter uses red-blue set cover to learn an effective blocking function but its need for labeled data makes it ineffective in settings that call for active learning. While other approaches for blocking are available (see [11] for an exhaustive list), a number of these utilize unsupervised learning [1,39]. While unsupervised learning of a blocking function may be applicable to our setting, it does not take advantage of the labels provided by the user and thus may not adapt as well to the data at hand compared to an approach such as DIAL which does take advantage of newly labeled pairs.…”
Section: Deep Learning For Entity Resolutionmentioning
confidence: 99%
“…The latter uses red-blue set cover to learn an effective blocking function but its need for labeled data makes it ineffective in settings that call for active learning. While other approaches for blocking are available (see [11] for an exhaustive list), a number of these utilize unsupervised learning [1,39]. While unsupervised learning of a blocking function may be applicable to our setting, it does not take advantage of the labels provided by the user and thus may not adapt as well to the data at hand compared to an approach such as DIAL which does take advantage of newly labeled pairs.…”
Section: Deep Learning For Entity Resolutionmentioning
confidence: 99%
“…Even though crowdsourcing has been explored for record linkage [23] to mitigate the lack of ground truth data, allowing the public to classify record pairs is not applicable in many domains due to privacy concerns [4]. Active learning approaches, where a small number of selected record pairs are manually classified by trusted domain experts, have therefore been adopted for record linkage to generate ground truth data suitable to train supervised classifiers [17,18,24], or to generate high quality blocking results [21]. Active learning based on domain expertise, while being able to generate high quality ground truth data, can however only generate small numbers of labelled record pairs.…”
Section: Related Workmentioning
confidence: 99%
“…Gong et al [14] propose a novel inference method based on Bayesian Deep Latent Gaussian Model (BELGAM) to select initial training instances. Another option to address the cold-start problem is an unsupervised matching method that proposes bootstrapping active learning [21]. Furthermore, Deng et al [9] introduced a sequence-based adversarial learning model to select initial set of training instances for the AL methods.…”
Section: Introductionmentioning
confidence: 99%