Simran Khanuja scite author profile

Simran Khanuja

5Publications

121Citation Statements Received

62Citation Statements Given

How they've been cited

148

120

How they cite others

Affiliations

Carnegie Mellon University

Publications

Order By: Most citations

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

Khanuja¹,

Dandapat²,

Srinivasan³

et al. 2020

View full text Add to dashboard Cite

Code-switching is the use of more than one language in the same conversation or utterance. Recently, multilingual contextual embedding models, trained on multiple monolingual corpora, have shown promising results on cross-lingual and multilingual tasks. We present an evaluation benchmark, GLUECoS, for code-switched languages, that spans several NLP tasks in English-Hindi and English-Spanish. Specifically, our evaluation benchmark includes Language Identification from text, POS tagging, Named Entity Recognition, Sentiment Analysis, Question Answering and a new task for code-switching, Natural Language Inference. We present results on all these tasks using cross-lingual word embedding models and multilingual models. In addition, we fine-tune multilingual models on artificially generated code-switched data. Although multilingual models perform significantly better than cross-lingual models, our results show that in most tasks, across both language pairs, multilingual models fine-tuned on code-switched data perform best, showing that multilingual models can be further optimized for code-switching tasks.

show abstract

mSLAM: Massively multilingual joint pre-training for speech and text

Bapna¹,

Cherry²,

Jia³

et al. 2022

Preprint

View full text Add to dashboard Cite

FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech

Conneau¹,

Khanuja

et al. 2023

View full text Add to dashboard Cite

GLUECoS : An Evaluation Benchmark for Code-Switched NLP

Khanuja¹,

Dandapat²,

Srinivasan³

et al. 2020

Preprint

View full text Add to dashboard Cite

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

Conneau¹,

Ma²,

Khanuja³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding. 1 2 Note: For clarity we have renamed FLoRes "Chinese (Simp)" to "Mandarin Chinese" (code "cmn") and "Chinese (Trad)" to "Cantonese Chinese" (code "yue").

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Simran Khanuja

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

mSLAM: Massively multilingual joint pre-training for speech and text

FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech

GLUECoS : An Evaluation Benchmark for Code-Switched NLP

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

Contact Info

Product

Resources

About