The present paper deals with morphographemic alternations in Czech derivation with regard to the build-up of a large-coverage lexical resource specialized in derivational morphology of contemporary Czech (DeriNet database). After a summary of available descriptions in the Czech linguistic literature and Natural Language Processing, an extensive list of alternations is provided in the first part of the paper with a focus on their manifestation in writing. Due to the significant frequency and limited predictability of alternations in Czech derivation, several bottom-up methods were used in order to adequately model the alternations in Der-iNet. Suffix-substitution rules proved to be efficient for alternations in the final position of the stem, whereas a specialized approach of extracting alternations from inflectional paradigms was used for modelling alternations within the roots. Alternations connected with derivation of verbs were handled as a separate task. DeriNet data are expected to be helpful in developing a tool for morphemic segmentation and, once the segmentation is available, to become a reliable resource for data-based description of word formation including alternations in Czech. 1 In the paper, the term "root" refers to a morpheme that cannot be further analysed while "stem" is used, less specifically, for the part of a word without inflectional affixes (Haspelmath and Sims, 2010; Aronoff, ). Roots and stems are not together referred to as "bases" (cf. Bauer, 1983, pp. 20f) since we reserve the term "base" for the opposition of a base word vs. a derived word (target word, or derivative). These pairs are referred to as "pairs of base-target words" or "base-target pairs", too. 2 In the examples, the base word is written first followed by the derivative, the derivational relation is represented by an arrow. The alternations that accompany the derivation are listed above the arrow. The grapheme in the base is written first followed by ">" and the corresponding grapheme in the derivative. Boundaries between morphemes are indicated with the hyphens (the morphemic structure is not marked in Sect. 4 since the data are not segmented in the DeriNet network). In examples on diminutive derivation, we use "dimin." (diminutive) and "double dimin." (double diminutive) instead of the full English translation (e.g. 'small hippo' and 'very small hippo', respectively).
M. ŠevčíkováModelling Morphographemic Alternations in Czech