2019 National Information Technology Conference (NITC) 2019
DOI: 10.1109/nitc48475.2019.9114476
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Stopword Removal for Sinhala Language

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…Filtering text has rules for removing noise data and using the stopword removal method. Stopword removal is a step to filter out words that contain little information and keep meaningful words [10]. In this work, we used a slang word dictionary from colloquial-indonesian-lexicon to be filtered slang words and used a stopword dictionary from dataset stopwords-id-satya.…”
Section: B Natural Language Processingmentioning
confidence: 99%
“…Filtering text has rules for removing noise data and using the stopword removal method. Stopword removal is a step to filter out words that contain little information and keep meaningful words [10]. In this work, we used a slang word dictionary from colloquial-indonesian-lexicon to be filtered slang words and used a stopword dictionary from dataset stopwords-id-satya.…”
Section: B Natural Language Processingmentioning
confidence: 99%
“…Jayaweera et al [15] proposed a dynamic approach to find Sinhala stopwords. In this study, they argued the cutoff point is subjective to the dataset.…”
Section: Related Workmentioning
confidence: 99%
“…In this study, term frequency (TF), inverse document frequency (IDF), and term-frequency-inverse-documentfrequency (TF-IDF) are calculated for every term in the dataset. Authors select this approach since it is independent of the dataset size and domain, and it has been used in many other low-resource languages [15]. Further authors conducted multiple executions with different values to define the threshold.…”
Section: B Stopword Identificationmentioning
confidence: 99%
“…Our stopword list was produced using tf-idf ranking and consists of 210 Sinhala words. We further tested it using a number of classifierssee [11] for details. This list is called inside the Lucene Sinhala analyzer that we have developed.…”
Section: Sinhala Stopword Listmentioning
confidence: 99%