Learner corpora, electronic collections of spoken or written data from foreign language learners, offer unparalleled access to many hitherto uncovered aspects of learner language, particularly in their error-tagged format. This article aims to demonstrate the role that the learner corpus can play in CALL, particularly when used in conjunction with web-based interfaces which provide flexible access to error-tagged corpora that have been enhanced with simple NLP techniques such as POStagging or lemmatization and linked to a wide range of learner and task variables such as mother tongue background or activity type. This new resource is of interest to three main types of users: teachers wishing to prepare pedagogical materials that target learners' attested difficulties; learners themselves for editing or language awareness purposes and NLP researchers, for whom it serves as a benchmark for testing automatic error detection systems.
Le présent article se focalise sur le développement d'outils de traitement automatique des langues (TAL) pour l'apprentissage des langues assisté par ordinateur (ALAO). Après avoir identifié les limitations inhérentes aux outils d'ALAO dépourvus de composantes TAL, nous décrivons le cadre général du projet MIRTO, une plateforme de création d'activités pédagogiques fondé sur des outils TAL en développement au sein de notre laboratoire. Cette plateforme est organisée en quatre couches distinctes et successives : fonctions, scripts, activités et scénarios. À travers plusieurs exemples, nous expliquons en quoi l'architecture de MIRTO permet l'implantation de fonctions TAL classiques au sein de scripts, lesquels facilitent la conception, sans compétence informatique préalable, d'activités didactiques, elles-mêmes éventuellement intégrées au sein de séquences plus complexes, ou scénarios
This article focuses on the development of Natural Language Processing (NLP) tools for Computer Assisted Language Learning (CALL). After identifying the inherent limitations of NLP-free tools, we describe the general framework of Mirto, an NLP-based authoring platform under construction in our laboratory, and organized into four distinct layers: functions, scripts, activities and scenarios. Through several examples, we explain how Mirto's architecture allows to implement state-of-the-art NLP functions, integrate them into easily handled scripts in order to create, without computing skills, didactic activities that could be recorded in more complex sequences or scenarios.
We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.