2015
DOI: 10.1007/978-3-319-11056-1_5
|View full text |Cite
|
Sign up to set email alerts
|

Cross Language Duplicate Record Detection in Big Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
2
1
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 27 publications
0
5
0
Order By: Relevance
“…In this step, datasets are converted to Unicode system to support Arabic language. Several works (Yousef, 2015;Higazy et al, 2013;El-Shishtawy, 2013;Yousef, 2013) used a set of standardization rules for Arabic datasets. These rules consist in replacing a set of characters with their equivalent character.…”
Section: Preprocessingmentioning
confidence: 99%
See 1 more Smart Citation
“…In this step, datasets are converted to Unicode system to support Arabic language. Several works (Yousef, 2015;Higazy et al, 2013;El-Shishtawy, 2013;Yousef, 2013) used a set of standardization rules for Arabic datasets. These rules consist in replacing a set of characters with their equivalent character.…”
Section: Preprocessingmentioning
confidence: 99%
“…He served as full professor and the head of Evolutionary Engineering and Distributed Information Systems Laboratory (EEDIS Lab.) at Djillali Liabes University of Sidi Bel-Abbes, Algeria (2002-2015…”
mentioning
confidence: 99%
“…In this experiment, a comparison between the features allowed in the proposed web-based DRD framework and FEBRL is illustrated. This comparison appears in [60] as a part from their study for the available frameworks that perform DRD. The major advantages of the proposed DRD framework over FEBRL are: the availability of enhancing system behavior through real experiments by the users, allowing bi-lingual processing and introducing the sequential blocking technique instead of the current available indexing techniques.…”
Section: Experiments 3: Proposed Framework Features Compared To Febrlmentioning
confidence: 99%
“…Many researches in Record Linkage/Duplicate Detection have been developed and introduced. Some of them were about providing a complete framework or implementing techniques/algorithms that handle a specific stage in DRD [60]. The general steps for record linkage/duplicate detection [17] are; first is data cleaning and standardization where input data is converted into a well-defined form.…”
Section: Introductionmentioning
confidence: 99%
“…There are four dimensions of data quality standards: availability, usability, reliability, and relevance [3][4][5] . Reliability is characterized as the trustworthiness of data, including accuracy, consistency, completeness, and integrity 11 . The reliability of BD is difficult to achieve due to its characteristics and complex technology and architecture for processing data [3][4][5] .…”
Section: Introductionmentioning
confidence: 99%