Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/719
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Generative Model for Code Switched Text

Abstract: Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies. Accurate language models for code-switched text are critical for NLP tasks. Stateof-the-art data-intensive neural language models are difficult to train well from scarce language-labeled code-switched text. A potential solution is to use deep generative models to synthesize large volumes of realistic code-switched text. Although generative adversarial networks and variational autoenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
23
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3
1

Relationship

2
8

Authors

Journals

citations
Cited by 25 publications
(23 citation statements)
references
References 23 publications
0
23
0
Order By: Relevance
“…Their method needs an external NMT system to obtain monolingual fragment from code-switched text and is expensive to scale to more language pairs. Garg et al (2018) Samanta et al (2019) propose a two-level hierarchical variational autoencoder that models syntactic signals in the lower layer and language switching signals in the upper layer. Their model can leverage modest real code-switched text and large monolingual text to generate large amounts of codeswitched text along with its language at token level.…”
Section: Code-mixed Data Generationmentioning
confidence: 99%
“…Their method needs an external NMT system to obtain monolingual fragment from code-switched text and is expensive to scale to more language pairs. Garg et al (2018) Samanta et al (2019) propose a two-level hierarchical variational autoencoder that models syntactic signals in the lower layer and language switching signals in the upper layer. Their model can leverage modest real code-switched text and large monolingual text to generate large amounts of codeswitched text along with its language at token level.…”
Section: Code-mixed Data Generationmentioning
confidence: 99%
“…Most recently, Winata et al (2019a) utilized the language-agnostic metarepresentation method to represent the code-mixed sentences. There are also other studies (Adel et al, 2013a(Adel et al, ,b, 2015Choudhury et al, 2017;Winata et al, 2018;Gonen and Goldberg, 2018;Samanta et al, 2019) for code-mixed language modelling.…”
Section: Related Workmentioning
confidence: 95%
“…However, most of them are not directly suitable for generating bilingual code-mixed text, due to the unavailability of sufficient volume of gold-tagged codemixed text. Samanta et al (2019) proposed a generative method using a handful of gold-tagged data; but they cannot produce sentence level tags. Recently, Pratapa et al (2018) used linguistic constraints arising from Equivalence Constraint Theory to design a code-switching grammar that guides text synthesis.…”
Section: Related Workmentioning
confidence: 99%