Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications 2020
DOI: 10.18653/v1/2020.bea-1.6
|View full text |Cite
|
Sign up to set email alerts
|

Becoming Linguistically Mature: Modeling English and German Children’s Writing Development Across School Grades

Abstract: In this paper we employ a novel approach to advancing our understanding of the development of writing in English and German children across school grades using classification tasks. The data used come from two recently compiled corpora: The English data come from the the GiC corpus (983 school children in second-, sixth-, ninth-and eleventh-grade) and the German data are from the FD-LEX corpus (930 school children in fifth-and ninthgrade). The key to this paper is the combined use of what we refer to as 'compl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

4
4

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 30 publications
1
10
0
1
Order By: Relevance
“…The results indicated that the top eight most discriminative CMs include two academic (triand four)-gram measures, cTTR, two CMs pertaining to lexical sophistication (as gauged by word frequency lists from the ANC [39] and, MLWs, the word length measured in syllables), two syntactic CMs (DepC/T and MLC) and one word prevalence measure (Prevalence.USAWF, for more details on these measures, see [40]). These findings are consistent with those reported in the literature on automated approaches to L1 and L2 writing, which show a shift toward more advanced and sophisticated use of lexical items, including the increased use of academic vocabulary and multi-word sequences, across grade levels ( [41], [42]. As reported above, the use of ASR to generate transcripts compared to human manual transcripts did not have a large impact on the calculation of scores for seven of these eight CMs.…”
Section: Impact On Ata Scoressupporting
confidence: 90%
See 1 more Smart Citation
“…The results indicated that the top eight most discriminative CMs include two academic (triand four)-gram measures, cTTR, two CMs pertaining to lexical sophistication (as gauged by word frequency lists from the ANC [39] and, MLWs, the word length measured in syllables), two syntactic CMs (DepC/T and MLC) and one word prevalence measure (Prevalence.USAWF, for more details on these measures, see [40]). These findings are consistent with those reported in the literature on automated approaches to L1 and L2 writing, which show a shift toward more advanced and sophisticated use of lexical items, including the increased use of academic vocabulary and multi-word sequences, across grade levels ( [41], [42]. As reported above, the use of ASR to generate transcripts compared to human manual transcripts did not have a large impact on the calculation of scores for seven of these eight CMs.…”
Section: Impact On Ata Scoressupporting
confidence: 90%
“…Both manually and ASR generated transcripts of speech recordings were automatically analyzed using CoCoGen (short for Complexity Contour Generator), a computational tool that im- plements a sliding window technique to calculate within-text distributions of scores for a given language measures (see e.g. [27,28], for recent applications of the tool in the area of language learning).The impetus for the implementation of the measures in the tool comes from a wealth of recent multidisciplinary research that adopts an integrated approach to language [29] and language learning [30] as well as an extensive body of literature on CAF framework reviewed in Section 1. Here in this paper we employ a selection of 34 complexity measures (CMs) that fall into four categories (see below for the selection procedure):…”
Section: Automatic Text Analysis Setupmentioning
confidence: 99%
“…The texts from both datasets (the Big Five Essay dataset and the MBTI Kaggle dataset) were automatically analyzed using an automated text analysis (ATA) system that employs a sliding window technique to compute sentence-level measurements. These measurements capture the within-text distributions of scores for a given psycholinguistic feature, referred to here as 'text contours' (for recent applications of the ATA system in the context of text classification, see (Kerz et al, 2020;Qiao et al, 2021a,b). We extracted a set of 437 theorybased psycholinguistic features that can be binned into four groups: (1) features of morpho-syntactic complexity (N=19), ( 2) features of lexical richness, diversity and sophistication (N=77), (3) readability features (N=14), and (4) lexicon features designed to detect sentiment, emotion and/or affect (N=326).…”
Section: Measurement Of Text Contours Of Psycholinguistic Featuresmentioning
confidence: 99%
“…The texts from both datasets (GECO and PROVO) were automatically analyzed using CoCoGen (Ströbel et al, 2016), a computational tool that implements a sliding window technique to calculate sentence-level measurements that capture the within-text distributions of scores for a given language feature (for current applications of the tool in the context of text classification, see Kerz et al (2020Kerz et al ( , 2021). We extract a total of 107 features that fall into five categories: (1) measures of syntactic complexity (N=16), (2) measures of lexical richness (N=14), (3) register-based n-gram frequency measures (N=25), (4) readability measures (N=14), and (5) psycholinguistic measures (N=38).…”
Section: Measurement Of Text Propertiesmentioning
confidence: 99%