2021
DOI: 10.3390/info12050199
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora

Abstract: Up until today research in various educational and linguistic domains such as learner corpus research, writing research, or second language acquisition has produced a substantial amount of research data in the form of L1 and L2 learner corpora. However, the multitude of individual solutions combined with domain-inherent obstacles in data sharing have so far hampered comparability, reusability and reproducibility of data and research results. In this article, we present work in creating a digital infrastructure… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 38 publications
0
5
0
Order By: Relevance
“…Following the 6th International Conference for Learner Corpus Research (LCR 2022), a public call for feedback on a new draft of the core metadata standard has been sent to several relevant mailing lists 7 . Furthemore, at the same conference König et al (2022) presented their approach to testing the core metadata standard on several corpora and expressing it using CMDI.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Following the 6th International Conference for Learner Corpus Research (LCR 2022), a public call for feedback on a new draft of the core metadata standard has been sent to several relevant mailing lists 7 . Furthemore, at the same conference König et al (2022) presented their approach to testing the core metadata standard on several corpora and expressing it using CMDI.…”
Section: Discussionmentioning
confidence: 99%
“…For example CMDI is already omnipresent for all data published within CLARIN and can be modified to fit the data using profiles. As show by König et al (2022), it could form a starting point for a unified representation for learner corpora metadata. And because it is a standard format within a large infrastructure, existing tools can be used to create and modify the metadata for learner corpora.…”
Section: Discussionmentioning
confidence: 99%
“…Metadata for learner corpora is extremely important for pursuing different types of research and for the interoperability between corpora [19,20]. For example, age, gender and first languages are important for identification of learning problems for different demographic groups; task metadata -for studying the impact of the task on the type of language produced by learners in the essays.…”
Section: Ideal Infrastructure For Learner Languagementioning
confidence: 99%
“…Work on metadata standardization in LCR was initiated by Paquot and Granger in 2017 [21], was followed up by König et al in 2022 [22] and is still ongoing [23]. Paquot et al [23] identify eight groups of metadata -administrative, corpus design, learner, text, task, annotation, annotator and transcriber 2 -with multiple subcategories divided into obligatory and optional.…”
Section: Ideal Infrastructure For Learner Languagementioning
confidence: 99%
“…Among others, we have tested uploading other (bonus) learner corpora to the portal, and exporting them from the portal applying a unified set of metadata attributes and values (using 'N/A' as a value for absent attributes). This step has helped us make several Swedish learner corpora interoperable with each other, interoperability being a known challenge in CLARIN-related context (König et al, 2021;Stemle et al, 2019;Volodina et al, 2018).…”
Section: Introductionmentioning
confidence: 99%