2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021
DOI: 10.1109/icde51399.2021.00047
|View full text |Cite
|
Sign up to set email alerts
|

Valentine: Evaluating Matching Techniques for Dataset Discovery

Abstract: Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a buildi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 39 publications
(16 citation statements)
references
References 31 publications
0
16
0
Order By: Relevance
“…Lastly, a fundamental problem with creating ensemble approaches is that many existing entity matching solutions are not open-source or do not provide sufficient detail for the reproducibility of the results. This issue was also pointed out by Koutras et al [30]. The existing entity matching solutions require specific parameter settings and in-depth knowledge, which the paper often does not provide.…”
Section: B Lessons Learnedmentioning
confidence: 95%
“…Lastly, a fundamental problem with creating ensemble approaches is that many existing entity matching solutions are not open-source or do not provide sufficient detail for the reproducibility of the results. This issue was also pointed out by Koutras et al [30]. The existing entity matching solutions require specific parameter settings and in-depth knowledge, which the paper often does not provide.…”
Section: B Lessons Learnedmentioning
confidence: 95%
“…Techniques for data integration [9], [10], [11], [36], [39], [40], [41], [42] generally aim to automatically discover, select and aggregate related data in order to extend a given dataset. Many of the approaches deal with tabular data.…”
Section: Data Integrationmentioning
confidence: 99%
“…Third, fingerprints of each feature can be compared across databases to find the best matches (instance-based matching). Authors in [19] provide a comprehensive review of the above categories with pros and cons of the algorithms available in each category and present an interface that can evaluate different types of schema matching methods on a common metric. Practical implementations combine these approaches to complement each other [23].…”
Section: Related Workmentioning
confidence: 99%