Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of 2006
DOI: 10.3115/1220835.1220895
|View full text |Cite
|
Sign up to set email alerts
|

Cross linguistic name matching in English and Arabic

Abstract: This paper presents a solution to the problem of matching personal names in English to the same names represented in Arabic script. Standard string comparison measures perform poorly on this task due to varying transliteration conventions in both languages and the fact that Arabic script does not usually represent short vowels. Significant improvement is achieved by augmenting the classic Levenshtein edit-distance algorithm with character equivalency classes.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2008
2008
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(18 citation statements)
references
References 6 publications
0
17
0
Order By: Relevance
“…Al-Onaizan and Knight (2002) introduced an approach for machine transliteration of Arabic names. Freeman et al (2006) also introduced a system for name matching between English and Arabic, which Habash (2008) employed as part of generating English transliterations from Arabic words in the context of machine translation. This work is similar to ours in terms of text transliteration.…”
Section: Related Workmentioning
confidence: 99%
“…Al-Onaizan and Knight (2002) introduced an approach for machine transliteration of Arabic names. Freeman et al (2006) also introduced a system for name matching between English and Arabic, which Habash (2008) employed as part of generating English transliterations from Arabic words in the context of machine translation. This work is similar to ours in terms of text transliteration.…”
Section: Related Workmentioning
confidence: 99%
“…Thus, we find results that have low edit distance to our query. This approach is widely used in applications such as conventional spell correction, transliteration similarity and music information retrieval (Freeman et al, 2006;Toussaint and Oh, 2016).…”
Section: Levenshtein Distancementioning
confidence: 99%
“…Then, input terms are segmented into the available n-grams, and all possible transliterations are produced and scored based on their joint probabilities. Habash (2009) used an ambiguous mapping that utilized the soundslike indexing system Double Metaphones (Philips, 2000) combined with the direct mapping scores defined by Freeman et al (2006) to handle out-ofvocabulary words in the context of Arabic-English machine translation. Freeman et al (2006) extended Levenshtein Edit Distance to allow for improved matching of Arabic and English versions of the same proper names.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Chinese ( normalized to a similarity score as in (Freeman et al 2006), where the score ranges from 0 to 1, with 1 being a perfect match. This edit-distance score is shown in the LEV row.…”
Section: Romanmentioning
confidence: 99%