2018
DOI: 10.14778/3236187.3269461
|View full text |Cite
|
Sign up to set email alerts
|

Distributed representations of tuples for entity resolution

Abstract: Despite the efforts in 70+ years in all aspects of entity resolution (ER), there is still a high demand for democratizing ER - by reducing the heavy human involvement in labeling data, performing feature engineering, tuning parameters, and defining blocking functions. With the recent advances in deep learning, in particular distributed representations of words ( a.k.a . word embeddings), we present a novel ER system, called D eep ER, that achieves good accuracy, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
154
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 85 publications
(157 citation statements)
references
References 33 publications
1
154
0
2
Order By: Relevance
“…Entity matching methods can broadly be divided into rule-based, crowd-based, and machine learning-based methods [5,6,14]. Since 2018, an increasing number of neural network-based matching methods [13,23,30] have been proposed and have pushed the state-of-the-art performance especially for textual entity matching tasks [1]. We include Deepmatcher [23] into our experiments as an example of one of the initial neural network based matching systems.…”
Section: Related Workmentioning
confidence: 99%
“…Entity matching methods can broadly be divided into rule-based, crowd-based, and machine learning-based methods [5,6,14]. Since 2018, an increasing number of neural network-based matching methods [13,23,30] have been proposed and have pushed the state-of-the-art performance especially for textual entity matching tasks [1]. We include Deepmatcher [23] into our experiments as an example of one of the initial neural network based matching systems.…”
Section: Related Workmentioning
confidence: 99%
“…Distributed representation of records (DeepER). This is a recently proposed approach which applies a distributed representation of words (Ebraheem et al , 2018) for constructing a distributed representation of records. For each token (word) within an attribute value its distributed representation is obtained from one of the pre-trained embedding dictionaries.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…In particular, the application of machine learning (ML) offers a promising approach, which can be applied as an alternative to manual rule building (Köpcke et al , 2010). However, the existing ML-based approaches to RL are based on the assumption that the data obtained from different sources is structured and represented by overlapping sets of attributes (Ebraheem et al , 2018; Elfeky et al , 2002; Jurek et al , 2017; Kejriwal and Miranker, 2015; Ngomo and Lyko, 2013; Schneider et al , 2018; Sherif et al , 2017; Wang et al , 2015). This is very restrictive in terms of real world applications, given the increasing number of unstructured data sources such as social media channels, for example.…”
Section: Introductionmentioning
confidence: 99%
“…Then the matching step determines if each pair in the candidate set is a match. To our knowledge, as of November 2018 there have been only two published work on entity matching using deep learning: DeepER [33] and DeepMatcher [72]. We now describe both.…”
Section: Entity Matchingmentioning
confidence: 99%
“…Finally, given a set of tuples (e.g., the union of the two tables to be matched), we pass each tuple through all L hash tables to obtain a list of blocks. Then the candidate set for matching consists of all tuple pairs that appear together in at least one block (there are pruning strategies to further reduce the candidate set size, see [33]).…”
Section: Entity Matchingmentioning
confidence: 99%