This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR) levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research (LCR) community. The main findings address the methods used and lexical bias introduced by the task.
Access to justice could be significantly expanded if decision support systems were able to accurately interpret statements of fact by pro se (self-represented) litigants. Prior research, which has demonstrated that case decisions can often be predicted by machine-learning models trained on judges’ statements of facts, suggests the hypothesis that these same learning algorithms could be effectively applied to pro se litigants’ fact statements. However, there has been a dearth of corpora on which to test this hypothesis. This paper describes an experiment testing the ability to predict the outcome of pro se litigants’ complaints on a corpus of 5,842 cases initiated by citizen complaints. The results of this experiment were strikingly negative, suggesting that fact statements by unguided pro se litigants are far less amenable to simple machine-learning techniques than judges’ texts and appearing to disconfirm the hypothesis above.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.