Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020) 2020
DOI: 10.18653/v1/2020.wnut-1.22
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Entailment in Code-Mixed Hindi-English Conversations

Abstract: The presence of large-scale corpora for Natural Language Inference (NLI) has spurred deep learning research in this area, though much of this research has focused solely on monolingual data. Code-mixing is the intertwined usage of multiple languages, and is commonly seen in informal conversations among polyglots. Given the rising importance of dialogue agents, it is imperative that they understand code-mixing, but the scarcity of code-mixed Natural Language Understanding (NLU) datasets has precluded research i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…We compare the performance of our systems against the system with the highest test set performance discussed in Chakravarthy et al (2020) and the baselines provided by Khanuja et al (2020b). The performance of our systems is shown in Table 5.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We compare the performance of our systems against the system with the highest test set performance discussed in Chakravarthy et al (2020) and the baselines provided by Khanuja et al (2020b). The performance of our systems is shown in Table 5.…”
Section: Resultsmentioning
confidence: 99%
“…The confusion matrix for the predictions from our best model is shown Model Accuracy mBERT (Khanuja et al, 2020b) 61.09 Mod. mBERT (Khanuja et al, 2020b) 63.1 mod-mBERT (Chakravarthy et al, 2020) 62.41…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This was extended for a variety of code-switched tasks by Khanuja et al (2020b), where they showed improvements on several tasks using MLM pretraining on real and synthetic code-switched text. Chakravarthy et al (2020) further improved the NLI performance of mBERT by including large amounts of in-domain code-switched text during MLM pretraining. Gururangan et al (2020) empirically demonstrate that pretraining is most beneficial when the domains of the intermediate and target tasks are similar, which we observe as well.…”
Section: Related Workmentioning
confidence: 99%
“…NLP tools for monolingual and multilingual language processing have rapidly progressed in the past few years; thanks to the transformer-based models such as Multilingual BERT (Devlin et al, 2019) & XLM-RoBERTa (Conneau et al, 2020), and their pretraining techniques. On various mixed datasets, recent studies have shown that adopting multilingual pretrained models can perform better than their previous deep learning counterparts (Pires et al, 2019;Khanuja et al, 2020;Chakravarthy et al, 2020;Jayanthi and Gupta, 2021). While this looks promising for multilingual, the same is not translated to code-mixing.…”
Section: Introductionmentioning
confidence: 99%