Findings of the Association for Computational Linguistics: NAACL 2022 2022
DOI: 10.18653/v1/2022.findings-naacl.49
|View full text |Cite
|
Sign up to set email alerts
|

Language Models for Code-switch Detection of te reo Māori and English in a Low-resource Setting

Abstract: Te reo Māori, New Zealand's only indigenous language, is code-switched with English. Māori speakers are atleast bilingual, and the use of Māori is increasing in New Zealand English. Unfortunately, due to the minimal availability of resources, including digital data, Māori is under-represented in technological advances. Cloud-based multilingual systems such as Google and Microsoft Azure support Māori language detection. However, we provide experimental evidence to show that the accuracy of such systems is low w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…Perhaps, using larger dictionaries can speed up the hand-correction stages of corpus building. Also, we note that the language detection models being developed using the labelled Hansard corpus [23] will help future researchers obtain automatically labelled corpora with less time than the one developed here. The Development of a Labelled te reo Māori-English Bilingual Corpus…”
Section: Discussionmentioning
confidence: 95%
See 2 more Smart Citations
“…Perhaps, using larger dictionaries can speed up the hand-correction stages of corpus building. Also, we note that the language detection models being developed using the labelled Hansard corpus [23] will help future researchers obtain automatically labelled corpora with less time than the one developed here. The Development of a Labelled te reo Māori-English Bilingual Corpus…”
Section: Discussionmentioning
confidence: 95%
“…This corpus is useful for developing language models, language detection, code-switching detection and similar language technology for Māori-English language pairs. A language detection model was trained using the labelled Hansard corpus, and the results are reported in [23]. The language detection model was tested in a formal context (such as political context in the Hansard database) and informal context (tweets from the [17]) with greater than 80% accuracy.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, identifying monolingual and code-switch information from EKCS data problem needs to be addressed. James et al [25] provided an investigational evidence to demonstrat that the accuracy of cloud-based multilingual systems (Google and Microsoft Azure) is low when identifying Maori language. The proposed study shows that, the BiLSTM with bilingual embeddings to identify Maori-English code-switching points with an accuracy of 87%.…”
Section: Issn: 2088-8708 mentioning
confidence: 99%
“…The free-form medical text in the NZ-GP Harms dataset is predominantly written in English, but includes some te reo Māori. Automatic language detection in code-switched English-Māori text is an area of ongoing current research (James et al, 2022;Trye et al, 2022). For the purpose of this research, Māori language was identified manually.…”
Section: Understanding Bias In Text Datamentioning
confidence: 99%