An Overview of End-to-End Entity Resolution for Big Data

Christophides, Vassilis; Efthymiou, Vasilis; Palpanas, Themis; Papadakis, George; Stefanidis, Kostas

doi:10.1145/3418896

Cited by 164 publications

(87 citation statements)

References 170 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Blocking, which is surveyed by Christen [16], Papadakis et al [72,73], is considered an important subtask of entity matching, meant to tackle the quadratic complexity of potential matches. Christophides et al [17] specifically review entity matching techniques in the context of big data. There has been an uptick in interest in both machine learning and crowdsourcing as a solution to entity matching in recent years.…”

Section: Other Surveys and Extensive Overviewsmentioning

confidence: 99%

“…These steps can also be viewed as a chain of the subtasks or subproblems that make up entity matching. Inspired by processes and figures such as those in [15,17,24,36,66], Figure 2 depicts this reference model of the traditional entity matching process. We will use the model to frame the discussion of different methods using neural networks.…”

Section: The Entity Matching Processmentioning

confidence: 99%

“…Most techniques rely heavily on syntactic similarity, including those based on supervised machine learning. See Christen [16], Christophides et al [17] for extensive reviews on blocking techniques. In practice, it is not uncommon that blocking involves quite a bit of manual feature selection, picking out which attributes should be used and which technique to apply.…”

Section: The Entity Matching Processmentioning

confidence: 99%

See 2 more Smart Citations

Neural Networks for Entity Matching: A Survey

Barlaug¹,

Gulla²

2021

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

Entity matching is the problem of identifying which records refer to the same real-world entity. It has been actively researched for decades, and a variety of different approaches have been developed. Even today, it remains a challenging problem, and there is still generous room for improvement. In recent years, we have seen new methods based upon deep learning techniques for natural language processing emerge. In this survey, we present how neural networks have been used for entity matching. Specifically, we identify which steps of the entity matching process existing work have targeted using neural networks, and provide an overview of the different techniques used at each step. We also discuss contributions from deep learning in entity matching compared to traditional methods, and propose a taxonomy of deep neural networks for entity matching.

show abstract

Section: Other Surveys and Extensive Overviewsmentioning

confidence: 99%

Section: The Entity Matching Processmentioning

confidence: 99%

Section: The Entity Matching Processmentioning

confidence: 99%

See 1 more Smart Citation

Neural Networks for Entity Matching: A Survey

Barlaug¹,

Gulla²

2021

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

show abstract

“…Overviews of the main methods can be found in recent books [2,3,4,5], surveys [6,7,8] and tutorials [9,10,11,12].…”

Section: Introductionmentioning

confidence: 99%

“…See https://docs.docker.com/engine/install/debian for detailed instructions 8. See https://docs.docker.com/engine/install/fedora for detailed instructions 9.…”

mentioning

confidence: 99%

Reproducible experiments on Three-Dimensional Entity Resolution with JedAI

Mandilaras

Papadakis

Gagliardelli

et al. 2021

Information Systems

Self Cite

View full text Add to dashboard Cite

In Papadakis et al. [1], we presented the latest release of JedAI, an open-source Entity Resolution (ER) system that allows for building a large variety of end-to-end ER pipelines. Through a thorough experimental evaluation, we compared a schema-agnostic ER pipeline based on blocks with another schema-based ER pipeline based on similarity joins. We applied them to 10 established, real-world datasets and assessed them with respect to effectiveness and time efficiency. Special care was taken to juxtapose their scalability, too, using seven established, synthetic datasets. Moreover, we experimentally compared the effectiveness of the batch schema-agnostic ER pipeline with its progressive counterpart. In this companion paper, we describe how to reproduce the entire experimental study that pertains to JedAI's serial execution through its intuitive user interface. We also explain how to examine the robustness of the parameter configurations we have selected.

show abstract

Social Responsibility of Algorithms: An Overview

Tsoukiàs

2021

Integrated Series in Information Systems

View full text Add to dashboard Cite

Should we be concerned by the massive use of devices and algorithms which automatically handle an increasing number of everyday activities within our societies? The paper makes a short overview of the scientific investigation around this topic, showing that the development, existence and use of such autonomous artifacts is much older than the recent interest in machine learning monopolised artificial intelligence. We then categorise the impact of using such artifacts to the whole process of data collection, structuring, manipulation as well as in recommendation and decision making. The suggested framework allows to identify a number of challenges for the whole community of decision analysts, both researchers and practitioners.

show abstract

An Overview of End-to-End Entity Resolution for Big Data

Cited by 164 publications

References 170 publications

Neural Networks for Entity Matching: A Survey

Neural Networks for Entity Matching: A Survey

Reproducible experiments on Three-Dimensional Entity Resolution with JedAI

Social Responsibility of Algorithms: An Overview

Contact Info

Product

Resources

About