2009
DOI: 10.1002/asi.21063
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of n‐gram conflation approaches for Arabic text retrieval

Abstract: In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 28 publications
0
13
0
Order By: Relevance
“…Here, u and v are the words to be compared, the nested sum counts the number of n-grams in v that are similar to n-grams in a window the size of m around the same position in word v. For more details about the n-gram approach, we refer the reader to our previous work [10].…”
Section: N-gram Appraoch To Correct Mistakenly Ocred Wordsmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, u and v are the words to be compared, the nested sum counts the number of n-grams in v that are similar to n-grams in a window the size of m around the same position in word v. For more details about the n-gram approach, we refer the reader to our previous work [10].…”
Section: N-gram Appraoch To Correct Mistakenly Ocred Wordsmentioning
confidence: 99%
“…This increases the probability that the matching score between two strings can be higher even though they do not share the same concept. Therefore, we revised the computation of the similarity between words to take this aspect into account [10] and hence improve the n-gram approach precision. First of all, we check if the word is misspelled based on the dictionary entries.…”
Section: N-gram Appraoch To Correct Mistakenly Ocred Wordsmentioning
confidence: 99%
“…The definite article ( ال ) is always attached to nouns, and many conjunctions and prepositions are also attached as prefixes to nouns and verbs. This hinders the retrieval of morphological variants of words [23, 24]. The next example illustrates one of the challenges.…”
Section: Our Proposed Systemmentioning
confidence: 99%
“…System categorized the user interest in following category and sub categories shown in table 1.Interest of a user is saved for their subsequent search [19]. Users can modify their interest for every new search.…”
Section: Context Identificationmentioning
confidence: 99%