2021
DOI: 10.1145/3410157
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Transfer for Entity Resolution with Siamese Neural Networks

Abstract: The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 38 publications
0
11
0
Order By: Relevance
“…Our framework is matcher-agnostic: it can support any kind of matching function, such as a DNF of similarity join predicates on multiple attributes [18,20], a human judging the pairs of records on a crowdsourcing platform [17], an unsupervised matcher based on generative models [44], or a complex deep learning model exploiting pre-trained language models [22] or transfer learning [23]. Our framework allows indicating the matching function for a particular ER task within an SQL query denoted by 𝜇 𝑄 .…”
Section: Matching Functionmentioning
confidence: 99%
“…Our framework is matcher-agnostic: it can support any kind of matching function, such as a DNF of similarity join predicates on multiple attributes [18,20], a human judging the pairs of records on a crowdsourcing platform [17], an unsupervised matcher based on generative models [44], or a complex deep learning model exploiting pre-trained language models [22] or transfer learning [23]. Our framework allows indicating the matching function for a particular ER task within an SQL query denoted by 𝜇 𝑄 .…”
Section: Matching Functionmentioning
confidence: 99%
“…Existing explorations seek solution from leveraging external data to improving annotation efficiency. External data can be aggregated via transfer learning (Zhao and He, 2019;Kasai et al, 2019;Loster et al, 2021), or via pre-training language models . For better annotations, researchers tried active learning (Kasai et al, 2019;Nafa et al, 2020;Sarawagi and Bhamidipaty, 2002;Arasu et al, 2010), or crowd sourcing techniques (Wang et al, 2012;Gokhale et al, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…Supervised deep learning EM relies on large amounts of labeled training data, which is extremely costly in reality. Attempts have been made to leverage external data via transfer learning (Zhao and He, 2019;Kasai et al, 2019;Loster et al, 2021) and pre-trained language model based methods . Other attempts have also been made to improve labeling efficiency via active learning (Nafa et al, 2020) and crowdsourcing techniques (Gokhale et al, 2014;Wang et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“…Existing explorations seek solution from leveraging external data to improving annotation efficiency. External data can be aggregated via transfer learning (Zhao and He, 2019;Kasai et al, 2019;Loster et al, 2021), or via pre-training language models . For better annotations, researchers tried active learning (Kasai et al, 2019;Nafa et al, 2020;Sarawagi and Bhamidipaty, 2002;Arasu et al, 2010), or crowd sourcing techniques (Wang et al, 2012;Gokhale et al, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…Supervised deep learning EM relies on large amounts of labeled training data, which is extremely costly in reality. Attempts have been made to leverage external data via transfer learning (Zhao and He, 2019;Kasai et al, 2019;Loster et al, 2021) and pre-trained language model based methods . Other attempts have also been made to improve labeling efficiency via active learning (Nafa et al, 2020) and crowdsourcing techniques (Gokhale et al, 2014;Wang et al, 2012).…”
Section: Introductionmentioning
confidence: 99%