In this study, we present Morpheus, a joint contextual lemmatizer and morphological tagger. Morpheus is based on a neural sequential architecture where inputs are the characters of the surface words in a sentence and the outputs are the minimum edit operations between surface words and their lemmata as well as the morphological tags assigned to the words. The experiments on the datasets in nearly 100 languages provided by SigMorphon 2019 Shared Task 2 organizers show that the performance of Morpheus is comparable to the state-of-theart system in terms of lemmatization. In morphological tagging, on the other hand, Morpheus significantly outperforms the SigMorphon baseline. In our experiments, we also show that the neural encoder-decoder architecture trained to predict the minimum edit operations can produce considerably better results than the architecture trained to predict the characters in lemmata directly as in previous studies. According to the SigMorphon 2019 Shared Task 2 results, Morpheus has placed 3 rd in lemmatization and reached the 9 th place in morphological tagging among all participant teams.
Agglutinative languages such as Turkish, Finnish andHungarian require morphological disambiguation beforefurther processing due to the complex morphologyof words. A morphological disambiguator is usedto select the correct morphological analysis of a word.Morphological disambiguation is important because itgenerally is one of the first steps of natural languageprocessing and its performance affects subsequent analyses.In this paper, we propose a system that uses deeplearning techniques for morphological disambiguation.Many of the state-of-the-art results in computer vision,speech recognition and natural language processinghave been obtained through deep learning models.However, applying deep learning techniques to morphologicallyrich languages is not well studied. In this work,while we focus on Turkish morphological disambiguationwe also present results for French and German inorder to show that the proposed architecture achieveshigh accuracy with no language-specific feature engineeringor additional resource. In the experiments, weachieve 84.12 , 88.35 and 93.78 morphological disambiguationaccuracy among the ambiguous words forTurkish, German and French respectively.
A parallel corpus plays an important role in statistical machine translation (SMT) systems. In this study, our aim is to figure out the effects of parallel corpus
Machine translation (MT) quality is evaluated through comparisons between MT outputs and the human translations (HT). Traditionally, this evaluation relies on form related features (e.g. lexicon and syntax) and ignores the transfer of meaning reflected in HT outputs. Instead, we evaluate the quality of MT outputs through meaning related features (e.g. polarity, subjectivity) with two experiments. In the first experiment, the meaning related features are compared to human rankings individually. In the second experiment, combinations of meaning related features and other quality metrics are utilized to predict the same human rankings. The results of our experiments confirm the benefit of these features in predicting human evaluation of translation quality in addition to traditional metrics which focus mainly on form.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.