2019
DOI: 10.3384/nejlt.2000-1533.19667
|View full text |Cite
|
Sign up to set email alerts
|

The SweLL Language Learner Corpus

Abstract: The article presents a new language learner corpus for Swedish, SweLL, and the methodology from collection and pesudonymisation to protect personal information of learners to annotation adapted to second language learning. The main aim is to deliver a well-annotated corpus of essays written by second language learners of Swedish and make it available for research through a browsable environment. To that end, a new annotation tool and a new project management tool have been implemented, – both with the main pur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 22 publications
(39 citation statements)
references
References 35 publications
0
18
0
Order By: Relevance
“…In this project, we use SweLL-gold (Volodina et al, 2019), a collection consisting of 502 learner texts, manually corrected and tagged according to 6 top error categories which, in turn, have their own sub-categories (Rudebeck and Sundberg, 2021). The top error types are: Orthographic, Lexical, Morphological, Punctuation, Syntactical and Other (the category Other contains comments and unitelligible strings).…”
Section: Error-labeled Learner Datamentioning
confidence: 99%
See 1 more Smart Citation
“…In this project, we use SweLL-gold (Volodina et al, 2019), a collection consisting of 502 learner texts, manually corrected and tagged according to 6 top error categories which, in turn, have their own sub-categories (Rudebeck and Sundberg, 2021). The top error types are: Orthographic, Lexical, Morphological, Punctuation, Syntactical and Other (the category Other contains comments and unitelligible strings).…”
Section: Error-labeled Learner Datamentioning
confidence: 99%
“…In this paper, we present a pilot study to generate artificial error data for Swedish by mimicking error patterns present in authentic error datasets, namely, in the SweLL learner corpus (Volodina et al, 2019) and its one-error-per-sentence DaLAJ derivative (Volodina et al, 2021). We create a corruption pipeline to insert artificial errors into the sentences from COCTAILL, a corpus of textbooks used for teaching Swedish (Volodina et al, 2014).…”
Section: Introductionmentioning
confidence: 99%
“…We use the error-annotated learner corpus SweLL (Volodina et al, 2019) as a source of "unacceptable" sentences and select sentences containing corrections of the type that is of relevance to the SwedishGlue benchmark 1 (Adesam et al, 2020).…”
Section: Dataset Descriptionmentioning
confidence: 99%
“…The SweLL data (Volodina et al, 2019) has been collected over four years (2017-2020) from adult learners of Swedish from formal educational set-1 SwedishGlue is a collection of datasets for training and/or evaluating language models for a range of Natural Language Understanding (NLU) tasks. tings, such as courses and tests.…”
Section: The Source Corpusmentioning
confidence: 99%
See 1 more Smart Citation