In this article, we present the Nordic Word Order Database (NWD), with a focus on the rationale behind it, the methods used in data elicitation, data analysis and the empirical scope of the database. NWD is an online database with a user-friendly search interface, hosted by The Text Laboratory at the University of Oslo, launched in April 2019 (https://tekstlab.uio.no/nwd). It contains elicited production data from speakers of all of the North Germanic languages, including several different dialects. So far, 7 fieldtrips have been conducted, and data from altogether around 250 participants (age 16–60) have been collected (approx. 55 000 sentences in total). The data elicitation is carried out through a carefully controlled production experiment that targets core syntactic phenomena that are known to show variation within and/or between the North Germanic languages, e.g., subject placement, object placement, particle placement and verb placement. In this article, we present the motivations and research questions behind the database, as well as a description of the experiment, the data collection procedure, and the structure of the database
Denne artikkelen er en introduksjon til Leksikografisk bokmålskorpus (LBK). Vi starter med en historisk oversikt over ordboksarbeid som er utført for norsk språk, og forklarer bakgrunnen for at LBK ble bygd opp på den måten det ble. Deretter gir vi en oversikt over innholdet i korpuset, før vi til slutt viser hvordan man kan søke i korpuset ved hjelp av korpussøkeverktøyet Glossa.
The paper describes the improvement of the rule-based Constraint Grammar (CG) Oslo-Bergen Tagger (OBT) by the addition of a statistical module. It is in the nature of CG taggers to leave some words ambiguous between different readings, due to a lack of coverage by the linguistics-based rules. Such ambiguities are often a problem for applications that use the tagger, among them the Norwegian Newspaper Corpus. Our statistical module not only removes part of speech (PoS) and morphological ambiguities, but also disambiguates lemmas. We show how this new system, referred to as OBT+stat, in a straightforward manner combines the strengths of the linguistic knowledge-based CG approach with data-driven methods. The result is a high-performing, fully disambiguating PoS/morphological tagger and lemmatizer with very satisfactory evaluation results.
Language documentation, including the development and use of corpora, is frequently linked to revitalisation. This is also the case for the Kven language, a Finnic minoritised language, traditionally spoken in the two northernmost counties of Norway. Kven is a recognised minority language in Norway, protected by the European Charter for Regional or Minority Languages. This status led to increased efforts to document Kven, including the development of the Ruija Corpus, consisting of recordings of interviews in Kven. The corpus was an important tool for the standardisation of Kven. In this article we describe how the corpus was developed and account for search functions, including a discussion of the limitations of the corpus. We also discuss the role of corpora and other online tools for language revitalisation, with a particular focus on the standardisation of Kven and conclude by reflecting on how expertise also resides with the speakers of an endangered language and that they have a right to be involved in efforts of language documentation and revitalisation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.