2017
DOI: 10.18038/aubtda.322136
|View full text |Cite
|
Sign up to set email alerts
|

Stop Word Detection as a Binary Classification Problem

Abstract: In a wide group of languages, the stop words, which have only grammatical roles and not contributing to information content, may be simply exposed by their relatively higher occurrence frequencies. But, in agglutinative or inflectional languages, a stop word may be observed in several different surface forms due to the inflection producing noise.In this study, some of the well-known binary classification methods are employed to overcome the inflectional noise problem in stop word detection. The experiments are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…Both rule-based and statistical approaches to the field made it possible to automatically detect and remove stop words from given texts [5 , 9 , 12 , 15 , 16] . These methods were not limited to only English but were applied to many other languages, we concentrate on agglutinative languages in this list, such as Indonesian [16] , Turkish [4] and Bengali [17] .…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Both rule-based and statistical approaches to the field made it possible to automatically detect and remove stop words from given texts [5 , 9 , 12 , 15 , 16] . These methods were not limited to only English but were applied to many other languages, we concentrate on agglutinative languages in this list, such as Indonesian [16] , Turkish [4] and Bengali [17] .…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
“…There has been a lot of research already done in the task of stop word detection [2] , [3] , [4] . Most of the recent literature presents methods that are based on statistical approaches.…”
Section: Objectivementioning
confidence: 99%
See 1 more Smart Citation
“…Integrating term importance to this formulation lead to use of document-based metrics such as inverse document frequency (IDF) [7]. There are also several studies that utilised a combination of multiple attributes, integrating both TF and IDF into a single metric [8,9]. Other techniques include basic voting systems that aggregate the results of a multiple sorted lists [10].…”
Section: Related Workmentioning
confidence: 99%
“…But, in agglutinative or inflectional languages, a stopword may be observed in several different surface forms due to the inflection producing a higher number of possible candidates, including false positives. There has been some research involving the search for stopwords for agglutinative languages, such as [5 , 8] , and [7] .…”
Section: Data Descriptionmentioning
confidence: 99%