This article deals with the regularization of non-standard spellings of the verbal forms extracted from a corpus. It addresses the question of what the limits of regularization are when lemmatizing
AIMS OF RESEARCHThe aim of this research is to propose criteria that limit the process of normalization necessary to regularize the lemmata of Old English weak verbs from the second class. In general, lemmatization based on the textual forms provided by a corpus is a necessary step in lexicological analysis or lexicographical work. In the specific area of Old English studies, there are several reasons why it is important to compile a list of verbal lemmata. To begin with, the standard dictionaries of Old English, including An Anglo-Saxon Dictionary, A Concise Anglo-Saxon Dictionary and The student's Dictionary of Anglo-Saxon are complete although they are not based on an extensive corpus of the language but on the partial list of sources given in the prefaces or introductions to these dictionaries. Secondly, The Dictionary of Old English is based on the Dictionary of Old English Corpus, which contains all the surviving texts with a total of six million words, but is still in progress (the letter G was published in 2008). And, thirdly, this work can be seen as a contribution to the research programme in the morphology and semantics of Old English as presented in Martín Arista (2008, 2010a, 2010b, 2011a, 2011b, 2011c, 2012a, 2012b, 2012c, 2013a, 2013b, 2014 The outline of this article is as follows. Section 2 focuses on the relevant aspects of the morphology of the weak verbs of Old English. Section 3 discusses the diatopic and diachronic features of Old English that can be applied to the normalization of weak verbs. Section 4 presents the results of the analysis by inflectional form and lemma, and puts the focus on the criteria that both motivate and constrain the process of normalization. To round off, Section 5 draws the main conclusions of this research.
THE INFLECTIONAL MORPHOLOGY OF THE OLD ENGLISH VERBThis section deals with the characteristics of the three subclasses of weak verbs and their specific features. The first part offers some diachronic perspectives on this verbal class, while the second part of the section provides a purely synchronic description of the morphology of the weak verbal class.The Old English verbal endings derive from a number of Proto-Germanic endings, as Smith (2009: 113) remarks. The present indicative plural ending comes from the 3 rd person plural of the present indicative in Proto-Germanic (Gothic -and) whereas the preterite indicative plural ending can be traced back to the 3 rd person plural of the | 78