A Study on Multiword Expression Features in Emotion Detection of Code-Mixed Twitter Data

Tan, Kathleen Swee Neo; Lim, Tong Ming; Tan, Chi Wee

doi:10.1109/iicaiet51634.2021.9573850

Cited by 4 publications

(1 citation statement)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is followed by an analysis of the list of MWEs, and a comparison of expressions with loanwords and their equivalents is made available. Emotion detection in code-mixed Twitter data was investigated by Tan et al (2021) [25] using MWEs extracted from WordNet and WordNet Bahasa. Chinese character decomposition for neural MT with MWEs was proposed by Han et al (2021) [26].…”

Section: Literature Reviewmentioning

confidence: 99%

Identification and extraction of multiword expressions from Hindi & Urdu language in natural language processing

2022

IJATEE

View full text Add to dashboard Cite

Text can be translated from one language to another using statistical machine translation, but there are still gaps in the translations because of a lack of language resource material. Building a linguistic corpus necessarily requires the extraction of multiword expressions (MWE). MWE is a collection of words with idiomatic expression properties. However, due to its non-compositional meaning of distinctive words, identifying and extracting MWE is a timeconsuming task. In this case, an automated system has been developed for the extraction of MWEs from Hindi and Urdu language sources automatically. The entire process includes tagging, pattern matching, an identification algorithm, and the extraction of MWEs from the data. Tagging each word with a unique part of speech tag is used as an input to the pattern-matching algorithm. Using pattern matching, MWE tags of specific patterns were selected, and the algorithm for automatic MWE detection was built on top of that. The conditional random field (CRF++) model was used to automatically extract the MWEs from data. Confusion matrix was used to conduct the automated evaluation of thisproposed system. For Hindi and Urdu, the calculated overall accuracy is 96.82% and 96.62%, respectively.

show abstract

Section: Literature Reviewmentioning

confidence: 99%