Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop 2019
DOI: 10.18653/v1/p19-2018
|View full text |Cite
|
Sign up to set email alerts
|

MāOri Loanwords: A Corpus of New Zealand English Tweets

Abstract: Māori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the Māori community. Motivated by the lack of linguistic resources for studying how Māori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected Māori words that are likely to be known by New Zealanders who do not speak Māori. Since over 30% of these words turned out to be irrelevant (e.g., mana is a popular gamin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…Grieve et al, 2017 ). This process is briefly summarized below, but a more detailed explanation is given in Trye et al ( 2019 ).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Grieve et al, 2017 ). This process is briefly summarized below, but a more detailed explanation is given in Trye et al ( 2019 ).…”
Section: Methodsmentioning
confidence: 99%
“…Drawing on lessons learned from the original study (Trye et al, 2019 ), some improvements were made to further mitigate noise in the MLT corpus. First, the corpus was enhanced by deploying a Multinomial Naive Bayes model (McCallum and Nigam, 1998 ) that considered not only unigrams in the feature space (as per the previous study), but bigrams as well.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, it should be noted that, at the time of writing this paper, Modern Greek data sets and text mining approaches are certainly fewer in number when compared against works focusing on other, more widely spoken, popular languages, such as English, German, Spanish, French, Italian, Russian or Chinese. In general, when compared to corresponding tasks in other languages, the obtained accuracy results in Modern Greek web text mining tasks indicate a better overall performance than [50][51][52][53][54][55][56][57], comparable to [58][59][60], or worse than [61][62][63].…”
Section: Approaches On Developing Modern Greek Social Web Text Data Sets and On Modernmentioning
confidence: 98%
“…The RMT Corpus complements this existing body of Ma ¯ori-language text since, although a mixed-language corpus of three million English tweets exists that contains borrowed Ma ¯ori words or loanwords (Trye et al, 2019(Trye et al, , 2020, there is, to the best of our knowledge, no social media corpus comprising (almost) exclusively Ma ¯ori-language text.…”
Section: Existing Ma ¯Ori-language Resourcesmentioning
confidence: 99%