2012
DOI: 10.1007/s13740-012-0015-8
|View full text |Cite
|
Sign up to set email alerts
|

On Generating Benchmark Data for Entity Matching

Abstract: Entity matching has been a fundamental task in every major integration and data cleaning effort. It aims at identifying whether two different pieces of information are referring to the same real world object. It can also form the basis of entity search by finding the entities in a repository that best match a user specification. Despite the many different entity matching techniques that have been developed over time, there is still no widely accepted benchmark for evaluating and comparing them. This paper intr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0
2

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 26 publications
(20 citation statements)
references
References 66 publications
0
18
0
2
Order By: Relevance
“…In our experiments, we used the DBpedia (BTC12DBpedia) and Freebase (BTC12Freebase) datasets from BTC12, and the raw infoboxes from DBpedia 3.5 (Infoboxes), i.e., two different versions of DBpedia. We also included a movies dataset 7 , used in [15], extracted from DBpedia movies and IMDB, to validate the correctness of our algorithms.…”
Section: A Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…In our experiments, we used the DBpedia (BTC12DBpedia) and Freebase (BTC12Freebase) datasets from BTC12, and the raw infoboxes from DBpedia 3.5 (Infoboxes), i.e., two different versions of DBpedia. We also included a movies dataset 7 , used in [15], extracted from DBpedia movies and IMDB, to validate the correctness of our algorithms.…”
Section: A Datasetsmentioning
confidence: 99%
“…However, these algorithms have not yet been experimentally evaluated with Linked Open Data (LOD) datasets exhibiting different characteristics in terms of the underlying number of entity types and size of entity descriptions (in terms of property-value pairs), as well as their structural (i.e., property vocabularies) and semantic (i.e., common property values and URLs) overlap. Existing works in ER benchmarks [7] and evaluation frameworks [11] focus on the similarity of descriptions and how these similarities affect the matching decision of ER; not on blocking, explicitly. Their data variations (focusing on highly similar descriptions) are not adequate to evaluate blocking algorithms suitable for the Web of data.…”
Section: Introductionmentioning
confidence: 99%
“…Given a set of entity references, such as publication venue titles, entity resolution is the process of identifying which of them correspond to the same real-world entity [20]. In a recent survey on entity resolution (or entity matching), [21] presents an implementation of a framework for evaluating entity matching systems through a systematic generation of synthetic test cases. Other surveys and tutorials on entity resolution can be found in [22], [23], [24], and [25].…”
Section: Related Workmentioning
confidence: 99%
“…Unfortunately, for Big Data this is not a feasible solution since the level that the likelihood is considered high is not clear, plus, different situations may require different likelihood thresholds. The Big Data group platform is able to provide flexible on-the-fly integration [5] that depending on the items of interests, decides what needs to be integrated and what now. This mode is more suitable for Big Data since no a-priori decisions need to be made, yet, the level of complexity is highly increasing, which makes the task particularly challenging.…”
Section: The Platform Featuresmentioning
confidence: 99%