2023
DOI: 10.1177/02676583231176370
|View full text |Cite
|
Sign up to set email alerts
|

The CELI corpus: Design and linguistic annotation of a new online learner corpus

Stefania Spina,
Irene Fioravanti,
Luciana Forti
et al.

Abstract: This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…First, this study examined complexity in four Indo-European languages which, despite their observed differences, share common features (e.g., the same writing script). This is because most of the CEFR rated learner corpora focus on L2 Indo-European languages [ 92 ], e.g., [ 93 ], and the presence of an objective measure of L2 proficiency was necessary to examine the L2 effect in a robust manner. Investigating the utility of this information-theoretic complexity measure for L2 production in other non-Indo-European languages such as Chinese, Korean, Japanese and Arabic remains worthwhile.…”
Section: Discussionmentioning
confidence: 99%
“…First, this study examined complexity in four Indo-European languages which, despite their observed differences, share common features (e.g., the same writing script). This is because most of the CEFR rated learner corpora focus on L2 Indo-European languages [ 92 ], e.g., [ 93 ], and the presence of an objective measure of L2 proficiency was necessary to examine the L2 effect in a robust manner. Investigating the utility of this information-theoretic complexity measure for L2 production in other non-Indo-European languages such as Chinese, Korean, Japanese and Arabic remains worthwhile.…”
Section: Discussionmentioning
confidence: 99%
“…The CELI corpus (Spina et al, 2022(Spina et al, , 2023 1 is a learner corpus of Italian. It consists of written texts collected from the Italian language certification exams known as CELI, Certificati di Lingua Italiana 2 .…”
Section: The Celi Corpusmentioning
confidence: 99%