A system for the automatic segmentation of German words into morphs was developed. The main linguistic knowledge sources used by the system are a word syntax and a morph dictionary. The syntax is written in the formalism of right linear regular grammars and comprises approximately 1,400 rules describing the set of those sequences of morph classes which underlie syntactically well formed words. The morph dictionary contains almost 11,000 morphs. Each morph is assigned to up to 6 morph classes.-Statistical evaluations with 6000 test words showed that more than 99% of the segmented words got a correct segmentation.
Morphologically based automatic phonetic transcription by K. Wothke A system is described that automatically generates phonetic transcriptions for German orthographic words. The entire generative process consists of two main steps. In the first step, the system segments the words into their morphs, or prefixes, stems, and suffixes. This segmentation is very important for the transcription of German words, because the pronunciation of the letters depends also on their morphological environment. In the second step, the system transcribes the morphologically segmented words. Several transcriptions can be generated per word, thus permitting the system to take pronunciation variants into account This feature results from the application area of the system, which is the provision of phonetic reference units for an automatic large-vocabulary speech recognition system. Statistical evaluations show that the transcription system has an excellent linguistic performance: more than 99 percent of the segmented words obtain a correct segmentation in the first step, and more than 98 percent of the words receive a correct phonetic transcription in the second step. A large-vocabulary speech recognition system for German is being developed by the IBM Heidelberg Scientific Center. The methodological and the technical basis of the German system is the English speech recognition system TANGORA, developed at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York. The task of the project is the development and improvement of a voice-activated typewriter that creates the written equivalent of the utterances dictated by the user of the system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.