Part of speech (PoS)
IntroductionThe dream of automatically translating documents between two languages is one of the oldest pursuits of artificial intelligence research. Now, armed with vast amounts of example translations and powerful computers, we can witness significant progress toward achieving that dream. Statistical analysis of bilingual parallel corpora allow for the automatic construction of machine translation systems. Already, for some language pairs, statistical systems are the best machine translation systems currently available.Statistical Machine Translation is corpus-based and consequently requires a parallel corpus to learn a model [1], [2]. Parallel corpora are different from normal text corpora in that they are not just a collection of texts, but are bilingual or multilingual and structured so that every sentence is linked to its translations.Some works have shown that the translation quality can be increased by using additional features such as lemma, part of speech (PoS), gender and others. In their research, Koehn and Hoang [3] explained that by adding a factor of part-of-speech in English-German translator system, the quality of the translation was increased from 18.04% to 18.15%. They also showed that by using morphological factors and part-of-speech, the English-Spanish translator system quality was increased from 23.41% to 24.25%.Youssef et al.[4] examined the factors on adding part-of-speech on statistical translation system for English-Arabic. Research results showed that the addition of a factor of part-of-speech can improve the quality of translation from 0.6095% to 0.6394%. Razavian and Vogel [5] examined the factors on adding to the statistics based interpreter systems, for EnglishIraqi interpreter system, the quality of the translation was improved from 15.62% to 16.41%; for the Spanish-English translator system, the quality of the translation was improved from 32.53% to 32.84%; and for Arabic-English translator system, the quality of the translation was improved from 41.70% to 42.74%.For English-Indonesian, Sujaini et al.[6] conducted a study of the addition of PoS factors based on a statistical translator system factors. The results of these studies indicated that the PoS factor increased the quality of the English-Indonesian translation of 2%, from 31.26% to 33.26%.Grammatically, words can be divided into two categories: open class and closed class. Open class is a class category which number of words always increases over time, while closed class is a class category whose words are fixed. Grammatically different categories of words, commonly called Part of Speech [1].