“…CMCS can appear in various forms, including code-switching, inter-sentential, and intra-sentential code-mixing, and texts written in both Latin and native scripts. CMCS text classification corpora have been mainly created with respect to Indian languages such as Hindi-English (Bohra et al, 2018), Telugu-English (Gundapu and Mamidi, 2018), Tamil-English, Kannada-English, and Malayalam-English (Chakravarthi et al, 2022), while there are some corpora for other CMCS languages such as Sinhala-English (Smith and Thayasivam, 2019), Spanish-English (Vilares et al, 2016) and Arabic-English (Sabty et al, 2019). Except for the dataset created by Chakravarthi et al (2022), others have removed the text written in native script and considered only a limited type of code-mixing levels.…”