Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management 2020
DOI: 10.1145/3340531.3412781
|View full text |Cite
|
Sign up to set email alerts
|

Profiling Entity Matching Benchmark Tasks

Abstract: Entity matching is a central task in data integration which has been researched for decades. Over this time, a wide range of benchmark tasks for evaluating entity matching methods has been developed. This resource paper systematically complements, profiles, and compares 21 entity matching benchmark tasks. In order to better understand the specific challenges associated with different tasks, we define a set of profiling dimensions which capture central aspects of the matching tasks. Using these dimensions, we c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(16 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…We generate classification models for product matching using three established methodologies: traditional machine leaning models [22], DeepMatcher hybrid models [2], and transformer architecture models [19]. We chose these three approaches for their recent state of the art performance on product matching tasks, and the availability of code from recent publications [2,19,21].…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…We generate classification models for product matching using three established methodologies: traditional machine leaning models [22], DeepMatcher hybrid models [2], and transformer architecture models [19]. We chose these three approaches for their recent state of the art performance on product matching tasks, and the availability of code from recent publications [2,19,21].…”
Section: Methodsmentioning
confidence: 99%
“…It is these matching and non-matching pairs together which form the correspondence set we use to train and evaluate our classification systems. We follow the same approach to generating the correspondence sets as used in [21], shown in Algorithm 1. Non-matching pairs are drawn from the Cartesian product of the two datasets being matched, excluding the pairs already labeled as known matches.…”
Section: Training Data Generation and Blockingmentioning
confidence: 99%
See 3 more Smart Citations