2012
DOI: 10.1504/ijmso.2012.050014
|View full text |Cite
|
Sign up to set email alerts
|

De-duplication of aggregation authority files

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…Among these we can mention: record linkage, entity resolution, duplicate detection, co-reference resolution, object consolidation, reference reconciliation, fuzzy match, object identification, object consolidation, entity clustering, merge/purge, identity uncertainty, etc.. Many techniques developed in such research field resulted into respective deduplication tools over the years [5], [6], [7], [8]. Among existing approaches some address the problem of record linkage or entity resolution for "big" flat collections, some consider specific problems in the disambiguation of "graphs", but to our knowledge none has proposed systems for the deduplication of big graphs.…”
Section: State Of the Art And Motivationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Among these we can mention: record linkage, entity resolution, duplicate detection, co-reference resolution, object consolidation, reference reconciliation, fuzzy match, object identification, object consolidation, entity clustering, merge/purge, identity uncertainty, etc.. Many techniques developed in such research field resulted into respective deduplication tools over the years [5], [6], [7], [8]. Among existing approaches some address the problem of record linkage or entity resolution for "big" flat collections, some consider specific problems in the disambiguation of "graphs", but to our knowledge none has proposed systems for the deduplication of big graphs.…”
Section: State Of the Art And Motivationsmentioning
confidence: 99%
“…Among existing approaches some address the problem of record linkage or entity resolution for "big" flat collections, some consider specific problems in the disambiguation of "graphs", but to our knowledge none has proposed systems for the deduplication of big graphs. Among the first category we can mention Dedoop 7 [10], PACE [6], and Dedupalog [11]. The first two tools are built on distributed column stores, respectively Hadoop MapReduce and Cassandra, and allow to efficiently process large collections to identify duplicates.…”
Section: State Of the Art And Motivationsmentioning
confidence: 99%
“…An analysis of the requirements of the functionalities for deduplication and record linkage systems was proposed by Köpcke, Thor and Rahm in (Köpcke et al , 2010). In fact, as a result of such studies several deduplication tools have been developed (Jurczyk et al , 2008; Manghi et al , 2012b; Christen, 2008; Kang et al , 2008). Our work is orthogonal to most of the large body of work in deduplication since, to our knowledge, no researcher has worked on systems for the deduplication of big data graphs: some approaches address the problem of record linkage or entity resolution for “big data” flat collections, while others tackle disambiguation of “graphs”, but none delivers the full workflow of big data graph deduplication as described above.…”
Section: Deduplication Of Big Data Graphsmentioning
confidence: 99%
“…Population of the graph requires more sophisticated deduplication algorithms, exploring object similarity beyond the equivalence of PIDs. D-NET offers de-duplication Services (Manghi, Mikulicic and Atzori, 2012) already in use by the OpenAIRE infrastructure production system, which will be deployed and configured to adapt to the DLI data model and identifying equivalent objects based on properties such as titles, author names, and publication year. Access to the information space graph.…”
Section: Forthcoming Actionsmentioning
confidence: 99%