2007 9th International Symposium on Signal Processing and Its Applications 2007
DOI: 10.1109/isspa.2007.4555345
|View full text |Cite
|
Sign up to set email alerts
|

N-gram and Local Context Analysis for Persian text retrieval

Abstract: The Persian language is one of the languages in MiddleEast, so

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
3
3
1

Relationship

5
2

Authors

Journals

citations
Cited by 22 publications
(12 citation statements)
references
References 9 publications
0
12
0
Order By: Relevance
“…TNT POS tagger was trained on Bijankhan POS collection with 40 tags. Subsequently the Hamshahri corpus [19] and its CLEF queries were tagged using this tagger (see ). After experimenting with different tagging schemas, the corpus and the queries were stemmed in order to evaluate the effect of stemming and its interaction with POS tagging in retrieval context.…”
Section: Methodology and Implementationmentioning
confidence: 99%
“…TNT POS tagger was trained on Bijankhan POS collection with 40 tags. Subsequently the Hamshahri corpus [19] and its CLEF queries were tagged using this tagger (see ). After experimenting with different tagging schemas, the corpus and the queries were stemmed in order to evaluate the effect of stemming and its interaction with POS tagging in retrieval context.…”
Section: Methodology and Implementationmentioning
confidence: 99%
“…The N-gram is a language-independent approach in which each word is broken down into substrings of length N. This approach has been applied to information retrieval in many languages such as English [9], Turkish [8], Malay [16] and Farsi [1] with varying degrees of success. In [13], Larkey et al have used the bigram and trigram string similarity approach for Arabic text retrieval.…”
Section: Related Researchmentioning
confidence: 99%
“…The N-gram approach has mixed performances in information retrieval. In some languages like English, it results in a poor performance however in languages like Farsi, it has an acceptable performance [1]. As mentioned earlier, most Arabic words are made up of roots with three letters which led us to use trigrams for word segmentation.…”
Section: N-gram Conflation and Co-occurrence Analysis Formentioning
confidence: 99%
“…For example if we consider an English query that has three terms then the most probable Persian translation of the query terms would be E[1,TDimes [1,1]], E[2,TDimes [1,2]] and E[3,TDimes [1,3]] respectively and the translated query's weight would be CTP[TopColumns [1],TopRows [1]]. …”
Section: That Correspond To Top N Translations Of the Query Q = {Q I mentioning
confidence: 99%