2016
DOI: 10.1016/j.procs.2016.06.076
|View full text |Cite
|
Sign up to set email alerts
|

On Continent and Script-Wise Divisions-Based Statistical Measures for Stop-words Lists of International Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
15
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(16 citation statements)
references
References 2 publications
1
15
0
Order By: Relevance
“…We would like to emphasize here that the researchers lemmatized the words after identifying a word as a stop word as opposed to lemmatizing all the words and then identifying the stop words. [17] collated a list of stop words in numerous languages spanning multiple countries. www.ijacsa.thesai.org…”
Section: Existing Work and Objectivesmentioning
confidence: 99%
“…We would like to emphasize here that the researchers lemmatized the words after identifying a word as a stop word as opposed to lemmatizing all the words and then identifying the stop words. [17] collated a list of stop words in numerous languages spanning multiple countries. www.ijacsa.thesai.org…”
Section: Existing Work and Objectivesmentioning
confidence: 99%
“…In the case of the English language, sets of stop-words widely used in Natural Language Processing are used in text cleaning tasks. This work uses a the publicly-available English stop-words set published in [33], and each word is weighted by textual and lexical functions in a sentence [34]. URL patterns are removed from the corpus and other expressions, such as retweets RT and appearances of @username, are considered non-informative attributes and are deleted in the same way.…”
Section: Tokenization and Noise Removalmentioning
confidence: 99%
“…The algorithm also tested on 200 documents and succeeded 99% accuracy and time efficiency. Saini and Rakholia [13] have presented an analytic in-depth report on continent and script-wise divisions-based statistical measures for stopwords lists of various international Languages. A. Alajmi et al [19] generated stop-words for the Arabic language using a statistical approach.1002 documents with over 700,000 words were tested and they achieved about 90% general accuracy.…”
Section: Related Workmentioning
confidence: 99%