2020
DOI: 10.1145/3385658.3385664
|View full text |Cite
|
Sign up to set email alerts
|

Domain- and Structure-Agnostic End-to-End Entity Resolution with JedAI

Abstract: We present JedAI, a new open-source toolkit for endto- end Entity Resolution. JedAI is domain-agnostic in the sense that it does not depend on background expert knowledge, applying seamlessly to data of any domain with minimal human intervention. JedAI is also structure-agnostic, as it can process any type of data, ranging from structured (relational) to semi-structured (RDF) and un-structured (free-text) entity descriptions. JedAI consists of two parts: (i) JedAI-core is a library of numerous state-of-the-art… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(7 citation statements)
references
References 27 publications
0
7
0
Order By: Relevance
“…In addition we show the best reported result found in related work from matching systems using supervised learning while matching systems that use other types of learning (e.g. active learning or semisupervised learning) are excluded from this comparison [13,18,22]. The interpretation of the comparison to the results reported in related work should be made with attention to the differing, unfixed train, optimization, and test sets.…”
Section: Baseline Resultsmentioning
confidence: 94%
See 1 more Smart Citation
“…In addition we show the best reported result found in related work from matching systems using supervised learning while matching systems that use other types of learning (e.g. active learning or semisupervised learning) are excluded from this comparison [13,18,22]. The interpretation of the comparison to the results reported in related work should be made with attention to the differing, unfixed train, optimization, and test sets.…”
Section: Baseline Resultsmentioning
confidence: 94%
“…For the cases where the complete mapping is provided, non-matching pairs can be generated by calculating the Cartesian product of all records and excluding the matching pairs. Given the size of the data sets, this often results in large numbers of non-matching pairs and thus motivates the usage of blocking techniques [8,18] to remove obvious non-matches which are not helpful for training and uninteresting for testing. As the benchmark tasks only define matches, different researchers who use these tasks, generate different sets of nonmatches which influence the model training [2,16,17].…”
Section: Benchmark Tasksmentioning
confidence: 99%
“…Note that there is a trade-off between the efficiency and the effectiveness of these two approaches [48]: step-by-step configuration is typically much faster, as it gradually minimizes the computational cost of every workflow step. In contrast, holistic configuration might involve a workflow step with high computational cost, as long as the overall F-Measure is high.…”
Section: Auxiliary Componentsmentioning
confidence: 99%
“…Note that JedAI has already been presented in a short journal paper [48] and as a demo in past conferences [12,84,85]. The first releases, i.e., version 1 [84], version 2 [12] and version 2.1 [48], cover exclusively the serialized execution of the budget-and schema-agnostic workflow that is presented in Section 4.1, while providing a rather limited experimental analysis of its performance [48]. The serialized implementation of the batch schema-based workflow and of the budget-and schemaagnostic workflow are briefly presented in [85], without evaluating their relative performance.…”
Section: Blockingmentioning
confidence: 99%
“…There are cases, however, where many of the entities in these general-interest KBs are irrelevant for certain applications, therefore domain-specific ontologies for semantic information brokering, based on inter-ontology relationships such as synonyms, hyponyms, and hypernyms of the extracted entities are used [56]. In order to further increase the links between morphologically dissimilar extracted entities and KB-related objects, neural-based methods are also implemented, exploiting word embeddings to represent semantic spaces [57,58], also allowing for domain-agnostic entity resolution [59]. With regard to sentiment analysis (neutral vs. emotionally loaded) and polarity (positive vs. negative) detection of a text [60], lexicon-based [61], ML-based [62], and neural-based [63] classifiers are commonly used to identify the polarity of a relation within a sentence.…”
Section: Advances In Entity Linking Enrichment and Representationmentioning
confidence: 99%