2022
DOI: 10.14778/3523210.3523226
|View full text |Cite
|
Sign up to set email alerts
|

Entity resolution on-demand

Abstract: Entity Resolution (ER) aims to identify and merge records that refer to the same real-world entity. ER is typically employed as an expensive cleaning step on the entire data before consuming it. Yet, determining which entities are useful once cleaned depends solely on the user's application, which may need only a fraction of them. For instance, when dealing with Web data, we would like to be able to filter the entities of interest gathered from multiple sources without cleaning the entire, continuously-growing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…We showed that four new weighting schemes give rise to feature sets that outperform the existing ones [25], while a very small, balanced training set with just 50 labelled instances suffices for high effectiveness, high time efficiency and high scalability. In the future, we will apply our approaches to Progressive ER [29,[33][34][35].…”
Section: Discussionmentioning
confidence: 99%
“…We showed that four new weighting schemes give rise to feature sets that outperform the existing ones [25], while a very small, balanced training set with just 50 labelled instances suffices for high effectiveness, high time efficiency and high scalability. In the future, we will apply our approaches to Progressive ER [29,[33][34][35].…”
Section: Discussionmentioning
confidence: 99%
“…The latter requirement is significantly challenging, since it demands to correctly sort the entities even before they are generated using data fusion, only relying on the original records that can produce them. In this section, we give an overview of how BrewER overcomes these challenges, while the detailed description of the algorithm is provided in the research paper [13].…”
Section: An Overview Of Brewermentioning
confidence: 99%
“…We will provide users with a set of dirty datasets, composed of the reference datasets used in the research paper [13] (i.e., cameras, USB sticks, and organizations) plus several additional ones (e.g., an extended version of cameras and further datasets of commercial products from the Alaska benchmark [4] and multiple datasets from the Magellan Data Repository 2 ). These datasets cover different domains and are highly heterogeneous in terms of cleanliness, number of attributes, and number of records, ranging from the 1K records of the smallest subset of USB sticks to the 29K records of the full camera dataset, on which the batch approach would take several hours to perform the entire cleaning process [13]. Each dataset comes with its ground truth, so the users will be able to assess the efficacy (precision/recall) of each step in the ER pipeline and the correctness of the results of the given queries.…”
Section: Demonstration Scenariosmentioning
confidence: 99%
See 2 more Smart Citations