This paper presents a solution to the problem of matching personal names in English to the same names represented in Arabic script. Standard string comparison measures perform poorly on this task due to varying transliteration conventions in both languages and the fact that Arabic script does not usually represent short vowels. Significant improvement is achieved by augmenting the classic Levenshtein edit-distance algorithm with character equivalency classes.
Errors in machine translations of English-Iraqi Arabic dialogues were analyzed at two different points in the systems" development using HTER methods to identify errors and human annotations to refine TER annotations. Although the frequencies of errors in the more mature systems were lower, the proportions of error types exhibited little change. Results include high frequencies of pronoun errors in translations to English, high frequencies of subject person inflection in translations to Iraqi Arabic, similar frequencies of word order errors in both translation directions, and very low frequencies of polarity errors. The problems with many errors can be generalized as the need to insert lexemes not present in the source or vice versa, which includes errors in multi-word expressions. Discourse context will be required to resolve some problems with deictic elements like pronouns.
Errors in machine translations of English-Iraqi Arabic dialogues were analyzed using the methods developed for the Human Translation Error Rate measure (HTER). Human annotations were used to refine the Translation Error Rate (TER) annotations. The analyses were performed on approximately 100 translations into each language from four translation systems. Results include high frequencies of pronoun errors and errors involving the copula in translations to English. High frequencies of errors in subject/person inflection and closed-word classes characterized translations to Iraqi Arabic. There were similar frequencies of word order errors in both translation directions and low frequencies of polarity errors. The problems associated with many errors can be predicted from structural differences between the two languages. Also problematic is the need to insert lexemes not present in the source or vice versa. Some problems associated with deictic elements like pronouns will require knowledge of the discourse context to resolve.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.