Abstract:The aim of this article is to devise the method of lemmatisation of strong verbs from a corpus of Old English with a view to maximising the automatic search for the inflectional forms, with the corresponding minimisation of manual revision of the verbs under analysis. The search algorithm, which consists of query strings and filters, is launched on the lemmatiser Norna, a component of the lexical database of Old English Nerthus. The conclusions of the article insist on the limits of automatic lemmatisation as well as the paths of refinement of the lemmatisation method in order to accomodate less predictable forms.Keywords: lemmatisation, Old English, lexical database, morphology, orthography.
LEMAS DE VERBOS FUERTES DESDE UN CORPUS DE INGLÉS ANTIGUO: AVANCES Y PROBLEMASResumen: el objetivo del presente artículo es idear un método de lematización de verbos fuertes de inglés antiguo, con el propósito de maximizar la búsqueda automática de formas flexivas, con la correspondiente reducción en la revisión manual de los verbos en estudio. El algoritmo de búsqueda consiste en cadenas de búsqueda y filtros, ejecutados en el lematizador Norna, un componente de la base de datos léxica de inglés antiguo Nerthus. Las conclusiones del artículo insisten en los límites de la lematización automática, así como en las posibilidades de refinamiento del método de lematización para acomodar las formas menos predecibles.Palabras clave: lematización, inglés antiguo, base de datos léxica, morfología, ortografía.
AIMS AND SCOPEThis article deals with the morphology of Old English and, more specifically, with the lemmatisation of strong verbs based on the textual forms in the Dictionary of Old English Corpus (henceforth DOEC).1 It focuses on the analytical steps required by lemmatisation as well as on the implementation of such steps in the lemmatiser Norna, an integral part of the lexical database of Old English Nerthus (www.nerthusproject.com). Along with the compilation of the initial inventory of lemmas of strong verbs and the design of a lemmatisation method, this article aims at maximising the automatic search for the inflectional forms of the verbs under analysis, with the corresponding minimisation of manual revision. With these aims, this article contributes to the research line in the linguistic analysis of Old English pursued, among others, by García García ), Martín Arista (2012a, 2012b, fc-a, fc-b), Mateo Mendaza (2013, 2015a, 2015b, 2016), Novo Urraca (2015, 2016a, 2016b) and Vea Escarza (2012, 2016. The relevance of the undertaking lies in the lack of a lemmatised corpus of Old English. The corpus of reference in the field of Old English studies, the DOEC, is annotated at text level (edition, author, prose/poetry/gloss) but dos not offer word tagging, neither by attested form nor by lemma. Other remarkable corpora, like the The York-Helsinki Parsed Corpus of Old English Poetry 1 This research has been funded through the project FFI2014-59110 (MINECO), which is gratefully acknowledged.