2018 IEEE 12th International Conference on Application of Information and Communication Technologies (AICT) 2018
DOI: 10.1109/icaict.2018.8747161
|View full text |Cite
|
Sign up to set email alerts
|

Initial Normalization of User Generated Content: Case Study in a Multilingual Setting

Abstract: We address the problem of normalizing user generated content in a multilingual setting. Specifically, we target comment sections of popular Kazakhstani Internet news outlets, where comments almost always appear in Kazakh or Russian, or in a mixture of both. Moreover, such comments are noisy, i.e. difficult to process due to (mostly) intentional breach of spelling conventions, which aggravates data sparseness problem. Therefore, we propose a simple yet effective normalization method that accounts for multilingu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 8 publications
0
0
0
Order By: Relevance