Since the 9/11 terrorist attacks on the World Trade Center, members of the Sikh-American community have been the subjects of random hate crimes in the United States because of their distinct identity, namely the turban. During and after the 2016 presidential election, many minority groups, including Sikh-Americans, were concerned over the rhetoric the then-candidate Donald Trump had been using. The focus of this research project was to study if the rhetoric used during the presidential campaign had any effect on how Sikh-Americans perceived their safety in a politically conservative state like Texas. The methods used to collect data were both qualitative and quantitative in nature. The qualitative portion was collected from one-on-one interviews with Sikh-Americans, and the quantitative portion was collected from surveys taken in gurdwaras (Sikh religious temples) located in both the Dallas-Fort Worth and Houston metroplexes. From the interviews and surveys, it was concluded that 27.6% of turban wearers felt threatened because of their appearance and felt a general feeling of discomfort from others’ lack of knowledge of Sikhism. Despite not having a distinct appearance, 28.6% of the non-turban wearing male respondents felt threatened sometime before and after the presidential election for their religious affiliation. From the results, it can be concluded that many Sikh-Americans feel unsafe living in Texas as Sikhs because of religious misidentification and intolerance.
While there has been significant progress towards developing NLU resources for Indic languages, syntactic evaluation has been relatively less explored. Unlike English, Indic languages have rich morphosyntax, grammatical genders, free linear word-order, and highly inflectional morphology. In this paper, we introduce Vyākarana: a benchmark of Colorless Green sentences in Indic languages for syntactic evaluation of multilingual language models. The benchmark comprises four syntax-related tasks: PoS Tagging, Syntax Tree-depth Prediction, Grammatical Case Marking, and Subject-Verb Agreement. We use the datasets from the evaluation tasks to probe five multilingual language models of varying architectures for syntax in Indic languages. Due to its prevalence, we also include a code-switching setting in our experiments. Our results show that the tokenlevel and sentence-level representations from the Indic language models (IndicBERT and MuRIL) do not capture the syntax in Indic languages as efficiently as the other highly multilingual language models. Further, our layerwise probing experiments reveal that while mBERT, DistilmBERT, and XLM-R localize the syntax in middle layers, the Indic language models do not show such syntactic localization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.