2020
DOI: 10.1145/3380967
|View full text |Cite
|
Sign up to set email alerts
|

Improving Code-mixed POS Tagging Using Code-mixed Embeddings

Abstract: Social media data has become invaluable component of business analytics. A multitude of nuances of social media text make the job of conventional text analytical tools difficult. Code-mixing of text is a phenomenon prevalent among social media users, wherein words used are borrowed from multiple languages, though written in the commonly understood roman script. All the existing supervised learning methods for tasks such as Parts Of Speech (POS) tagging for code-mixed social media (CMSM) text typically depend o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…In particular, the use of word embeddings with character n-grams to represent out-of-vocabulary (OOV) words (Bojanowski et al, 2017) which are often either intentionally/unintentionally misspelt words or slang words that are prevalent in social media can potentially help as they are able to reflect the semantic relationships between words. Two interesting ideas to consider is to train word embeddings using a code-mix corpus as was done for POS tagging by Bhattu et al (2020), and to include LVCs and other MWEs as single tokens in embeddings.…”
Section: Use Of Word Embeddingsmentioning
confidence: 99%
“…In particular, the use of word embeddings with character n-grams to represent out-of-vocabulary (OOV) words (Bojanowski et al, 2017) which are often either intentionally/unintentionally misspelt words or slang words that are prevalent in social media can potentially help as they are able to reflect the semantic relationships between words. Two interesting ideas to consider is to train word embeddings using a code-mix corpus as was done for POS tagging by Bhattu et al (2020), and to include LVCs and other MWEs as single tokens in embeddings.…”
Section: Use Of Word Embeddingsmentioning
confidence: 99%
“…CM POS tagging: (Bhattu et al, 2020a) addressed the problem of prediction of POS tags for OOV words in low resource languages using character-based word embedding as input features to a Bi-LSTM and CRF network. (Ball and Garrette, 2018) used a meta embedding approach for the part of speech tagging where the word is represented in both code-mixed languages.…”
Section: Related Workmentioning
confidence: 99%
“…Bi-LSTM CRF (I): This is the most commonly used model for the LI and POS tasks. It uses Bi-LSTM and CRF consecutively for POS tagging the CM text (Aguilar and Solorio, 2019;Bhattu et al, 2020a).…”
Section: Fcrf (J)mentioning
confidence: 99%
“…This provides an important basis for industry equipment failure prediction [1,2]. In order to further improve the accuracy of various failure predictions, many new technologies (for example, artificial intelligence, big data, and blockchain) have been gradually applied to factories [3,4].…”
Section: Introductionmentioning
confidence: 99%
“…But so far, the complex algorithm model [7,8] still has many limitations. In order to overcome these shortcomings, some studies have adopted machine learning algorithms, such as neural networks [4] and support vector machines (SVM) [5] to predict failure types. These studies have promoted the development of probabilistic models to a certain extent [9,10], but probabilistic models lack clear physical meaning in fault prediction.…”
Section: Introductionmentioning
confidence: 99%