2014
DOI: 10.1177/0165551514530655
|View full text |Cite
|
Sign up to set email alerts
|

Automatic identification of light stop words for Persian information retrieval systems

Abstract: Stop word identification is one of the most important tasks for many text processing applications such as information retrieval. Stop words occur too frequently in documents in a collection and do not contribute significantly to determining the context or information about the documents. These words are worthless as index terms and should be removed during indexing as well as before querying by an information retrieval system. In this paper, we propose an automatic aggregated methodology based on term frequenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0
2

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 10 publications
0
10
0
2
Order By: Relevance
“…By assessing the results for EElectronics and PElectronics, it can be found that the results for EElectronics are slightly better. Using a complex script is the main challenge in Persian writing [21,51,52]. For example, one of the issues in Persian text mining is the wide variety of declensional suffixes.…”
Section: Quantitative Analysismentioning
confidence: 99%
“…By assessing the results for EElectronics and PElectronics, it can be found that the results for EElectronics are slightly better. Using a complex script is the main challenge in Persian writing [21,51,52]. For example, one of the issues in Persian text mining is the wide variety of declensional suffixes.…”
Section: Quantitative Analysismentioning
confidence: 99%
“…Stop lists have been generated for other languages, such as Chinese (Zou et al, 2006), Thai (Daowadung and Chen, 2012) and Farsi (Sadeghi and Vegas, 2014), using uses similar frequency threshold approaches, are susceptible to the same issues discussed here.…”
Section: Introductionmentioning
confidence: 99%
“…In order to better improve and use the system, we add some auxiliary equipment and materials to the online management and training environment, where efficient intelligence supporting is one of the main characteristics. There are already many systems supporting various types of intelligence, including attendance management , automatic identification ID card information , and automatic mark examination papers . However, to separate the various functions will affect the management efficiency, and increasing the costs of learning.…”
Section: Introductionmentioning
confidence: 99%
“…There are already many systems Correspondence to H. Xu (railway_dragon@163.com). supporting various types of intelligence, including attendance management [16], automatic identification ID card information [17], and automatic mark examination papers [18]. However, to separate the various functions will affect the management efficiency, and increasing the costs of learning.…”
Section: Introductionmentioning
confidence: 99%