In this article, we describe the latest version of Sibylle, an AAC system that permits persons who have severe physical disabilities to enter text with any computer application, as well as to compose messages to be read out through speech synthesis. The system consists of a virtual keyboard comprising a set of keypads that allow for the entering of characters or full words by a single-switch selection process. It also includes a sophisticated word prediction component which dynamically calculates the most appropriate words for a given context. This component is auto-adaptive, that is, it learns with every text the user enters. It thus adapts its predictions to the user's language and the current topic of communication as well. So far, the system works for French, German and English. Earlier versions of Sibylle have been used since 2001 in a rehabilitation center (Kerpape, France).
The question of data reliability is of first importance to assess the quality of manually annotated corpora. Although Cohen ' s κ is the prevailing reliability measure used in NLP, alternative statistics have been proposed. This paper presents an experimental study with four measures (Cohen's κ, Scott's π, binary and weighted Krippendorff ' s α) on three tasks: emotion, opinion and coreference annotation. The reported studies investigate the factors of influence (annotator bias, category prevalence, number of coders, number of categories) that should affect reliability estimation. Results show that the use of a weighted measure restricts this influence on ordinal annotations. They suggest that weighted α is the most reliable metrics for such an annotation scheme.
Automatic identification of multiword expressions (MWEs), like to cut corners 'to do an incomplete job ', is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. This paper deals with a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. A simple language-independent system based on a combination of filters competes with the best systems from a recent shared task: it obtains the best averaged F-score over 11 languages (0.6653) and even the best score for both seen and unseen VMWEs due to the high proportion of seen VMWEs in texts. This highlights the fact that focusing on the identification of seen VMWEs could be a strategy to improve VMWE identification in general.This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http:// creativecommons.org/licenses/by/4.0/.1 Henceforth, the lexicalized components of a MWE, i.e. those always realized by the same lexemes, appear in bold.2 Henceforth, literal and coincidental occurrences are highlighted with wavy underlining, following Savary et al. (2019b).
Hinrichs et al., 2005) Informations (News) 800 000 AnglaisOntoNotes (Pradhan et al., 2007) News, dialogue oral, conversation téléphonique, weblogs, flux radio 50 000Chinois OntoNotes (Pradhan et al., 2007) ou télédiffusés 400 000 Catalan AnCora-Ca (Recasens & Marti, 2010) Informations 400 000Espagnol Ancora-Es (Recasens, 2010) Informations 400 000 Japonais NAIST Text (Idia et al., 2007) Informations 970 000 Hollandais COREA (Heindrickx et al., 2008) Informations, parole, encyclopédie 325 000 Tchèque PDT (Nedouluzhko et al., 2009) Journaux d'information 800 000 Polonais PCC (Ogrodniczuk et al., 2013) Nombreux genres oraux et écrits 514 000Cet article a pour objectif de décrire la ressource et son outil de requêtage, puis de présenter une étude de corpus portant sur la question de l'accord en genre et nombre lors de la reprise coréférentielle. Cette étude questionnera directement certaines hypothèses acceptées sur le langage écrit mais jamais étudiées sur l'oral, tout en fournissant une première illustration des capacités d'analyse qu'offrent le corpus et son outil d'interrogation. Présentation du corpus ANCOR Contenu : corpus audio sourcesLe corpus ANCOR ne concerne que la modalité orale. Sans constituer une ressource équilibrée comme le corpus PCC polonais, il a pour ambition de représenter une réelle diversité de situations discursives orales. Il regroupe ainsi l'annotation de quatre corpus de parole spontanée transcrits sous Transcriber (Barras et al., 2001). Ces corpus sont présentés dans le tableau 2. Deux d'entre eux ont été extraits du corpus ESLO, qui regroupe des entretiens sociolinguistiques présentant un degré d'interactivité faible (Baude et Dugua 2011, Eshkol-Taravella et al. 2012. A l'opposé, les deux autres corpus, OTG et Accueil_UBS (Nicolas et al., 2002), concernent des dialogues homme-homme interactifs. Ces deux derniers corpus différent par le média utilisé : le corpus OTG regroupe des conversations de visu au sein d'un office de tourisme pour OTG, tandis qu'Accueil_UBS a été enregistré dans un standard téléphonique. Au total, le corpus regroupe 488 000 mots et correspond à une durée d'enregistrement de 30,5 heures. Tableau 2 -Contenu du corpus ANCOR : corpus audio sources Corpus Méthodologie d'annotationL'annotation a été réalisée sur le logiciel GLOZZ (Mathet et Widlöcher, 2009) Encoding Initiative). Les annotations réalisées sous GLOZZ sont séparées du corpus source avec lequel elles sont synchronisées. Une telle annotation déportée permet un enrichissement multi-niveaux du corpus, ce qui est intéressant en termes d'évolutivité. Afin de limiter la charge cognitive des experts et pour favoriser la cohérence intra-annotateurs, le processus d'annotation a été divisé en quatre étapes successives :1. Caractérisation des mentions (annotateurs : étudiants de Master ou de doctorat en linguistique) 2. Vérification de la phase 1 par un superviseur 3. Caractérisation des relations de coréférence ou anaphoriques (annotateurs identiques) 4. Vérification de la phase 3 par un superviseur. Schéma ...
Abstract-Information technology plays a very important role in society. People with disabilities are often limited by slow text input speed despite the use of assistive devices. This study aimed to evaluate the effect of a dynamic on-screen keyboard (Custom Virtual Keyboard) and a word-prediction system (Sibylle) on text input speed in participants with functional tetraplegia. Ten participants tested four modes at home (static onscreen keyboard with and without word prediction and dynamic on-screen keyboard with and without word prediction) for 1 mo before choosing one mode and then using it for another month. Initial mean text input speed was around 23 words per minute with the static keyboard and 12 words per minute with the dynamic keyboard. The results showed that the dynamic keyboard reduced text input speed by 37% compared with the standard keyboard and that the addition of word prediction had no effect on text input speed. We suggest that current forms of dynamic keyboards and word prediction may not be suitable for increasing text input speed, particularly for subjects who use pointing devices. Future studies should evaluate the optimal ergonomic design of dynamic keyboards and the number and position of words that should be predicted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.