Petter Mæhlum scite author profile

Petter Mæhlum

3Publications

0Citation Statements Received

28Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Mæhlum¹,

Kåsen²,

Touileb³

et al. 2022

Preprint

View full text Add to dashboard Cite

Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokmål and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset annotated with POS-tags. We show that models trained on Universal Dependency (UD) data perform worse when evaluated against this dataset, and that models trained on Bokmål generally perform better than those trained on Nynorsk. We also see that performance on dialectal tweets is comparable to the written standards for some models. Finally we perform a detailed analysis of the errors that models commonly make on this data.

show abstract

NorDial: A Preliminary Corpus of Written Norwegian Dialect Use

Barnes¹,

Mæhlum²,

Touileb³

2021

Preprint

View full text Add to dashboard Cite

Norway has a large amount of dialectal variation, as well as a general tolerance to its use in the public sphere. There are, however, few available resources to study this variation and its change over time and in more informal areas, e.g. on social media. In this paper, we propose a first step to creating a corpus of dialectal variation of written Norwegian. We collect a small corpus of tweets and manually annotate them as Bokmål, Nynorsk, any dialect, or a mix. We further perform preliminary experiments with state-of-the-art models, as well as an analysis of the data to expand this corpus in the future. Finally, we make the annotations and models available for future work.

show abstract

NorDiaChange: Diachronic Semantic Change Dataset for Norwegian

Kutuzov¹,

Touileb²,

Mæhlum³

et al. 2022

Preprint

View full text Add to dashboard Cite

We describe NorDiaChange: the first diachronic semantic change dataset for Norwegian. NorDiaChange comprises two novel subsets, covering about 80 Norwegian nouns manually annotated with graded semantic change over time. Both datasets follow the same annotation procedure and can be used interchangeably as train and test splits for each other. NorDiaChange covers the time periods related to pre-and post-war events, oil and gas discovery in Norway, and technological developments. The annotation was done using the DURel framework and two large historical Norwegian corpora. NorDiaChange is published in full under a permissive licence, complete with raw annotation data and inferred diachronic word usage graphs (DWUGs).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Petter Mæhlum

Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

NorDial: A Preliminary Corpus of Written Norwegian Dialect Use

NorDiaChange: Diachronic Semantic Change Dataset for Norwegian

Contact Info

Product

Resources

About