A hybrid Technique for Cleaning Missing and Misspelling Arabic Data in Data Warehouse

Al-Hagery, Mohammed Abdullah; Alreshoodi, Latifah Abdullah; Almutairi, Maram Abdullah; Alsharekh, Suha Ibrahim; Alkhowaiter, Emtenan Saad

doi:10.5815/ijitcs.2019.07.03

Search citation statements

Order By: Relevance

Paper Sections

Select...

Error Detection1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2022

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this step, misspelled words are detected. Many techniques, such as dictionary search and morphology analysis, are used to detect errors in Arabic languages [25], [26]. Dictionary lookup is the most common and the fastest method due to the size of the dictionary (corpus).…”

Section: Error Detectionmentioning

confidence: 99%

Missing values imputation in Arabic datasets using enhanced robust association rules

Salem

Emran

Muda

et al. 2022

IJEECS

View full text Add to dashboard Cite

Missing value (MV) is one form of data completeness problem in massive datasets. To deal with missing values, data imputation methods were proposed with the aim to improve the completeness of the datasets concerned. Data imputation's accuracy is a common indicator of a data imputation technique's efficiency. However, the efficiency of data imputation can be affected by the nature of the language in which the dataset is written. To overcome this problem, it is necessary to normalize the data, especially in non-Latin languages such as the Arabic language. This paper proposes a method that will address the challenge inherent in Arabic datasets by extending the enhanced robust association rules (ERAR) method with Arabic detection and correction functions. Iterative and Decision Tree methods were used to evaluate the proposed method in an experiment. Experiment results show that the proposed method offers a higher data imputation accuracy than the Iterative and Decision Tree methods.

show abstract

Section: Error Detectionmentioning

confidence: 99%