Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching 2021
DOI: 10.18653/v1/2021.calcs-1.6
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

Abstract: We describe models focused at the understudied problem of translating between monolingual and code-mixed language pairs. More specifically, we offer a wide range of models that convert monolingual English text into Hinglish (code-mixed Hindi and English). Given the recent success of pretrained language models, we also test the utility of two recent Transformer-based encoder-decoder models (i.e., mT5 and mBART) on the task finding both to work well. Given the paucity of training data for code-mixing, we also pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 18 publications
1
10
0
Order By: Relevance
“…The results demonstrate the effectiveness of BT augmentation in improving performance for text classification tasks. This finding is consistent with previous research [36,37], which demonstrated the efficacy of utilizing pre-trained models like mBART for BT augmentation. By leveraging the pre-trained language modeling capabilities of mBART, we can generate high-quality synthetic translations that can be used to augment the training data.…”
Section: Resultssupporting
confidence: 93%
“…The results demonstrate the effectiveness of BT augmentation in improving performance for text classification tasks. This finding is consistent with previous research [36,37], which demonstrated the efficacy of utilizing pre-trained models like mBART for BT augmentation. By leveraging the pre-trained language modeling capabilities of mBART, we can generate high-quality synthetic translations that can be used to augment the training data.…”
Section: Resultssupporting
confidence: 93%
“…Due to resource scarcity, there have been efforts to explore the utilization of back-translated data in NMT (Jawahar et al, 2021). There are multiple efforts towards fine-tuning pre-trained language models, and generating pseudo parallel data (Winata et al, 2019;Gautam et al, 2021;Jawahar et al, 2021;Srivastava and Singh, 2022;Solorio et al, 2021). However, the use of large language models is still unexplored for CSMT.…”
Section: Related Workmentioning
confidence: 99%
“…There is rising interest in translating code-switched data (Nagoudi et al, 2021). Our purpose here is to translate Arabic text involving code-switching from a foreign language into (i) that foreign language as well as into (ii) MSA.…”
Section: Code-switched Translationmentioning
confidence: 99%
“…Our work also meets an existing need for pre-trained Transformer-based sequenceto-sequence models. In other words, while several BERT-based models have been pre-trained for Arabic (Antoun et al, 2020;Abdul-Mageed et al, 2021;Inoue et al, 2021), no such attempts have been made to create sequence-to-sequence models that we know of. Another motivation for our work is absence of an evaluation benchmark for Arabic language generation tasks.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation