Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing - EMNLP '06 2006
DOI: 10.3115/1610075.1610112
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised named entity transliteration using temporal and phonetic correlation

Abstract: In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics -and therefore share references to named entities -but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2006
2006
2017
2017

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(22 citation statements)
references
References 19 publications
0
22
0
Order By: Relevance
“…Tao et al (2006) show improvement in transliteration mining performance using phonetic feature vectors resembling the ones we have used. Jagarlamudi and Daumé III (2012) use phonemic representa- Figure 1: Overview of Proposed Approach tion based interlingual projection for multilingual transliteration mining.…”
Section: Related Workmentioning
confidence: 87%
“…Tao et al (2006) show improvement in transliteration mining performance using phonetic feature vectors resembling the ones we have used. Jagarlamudi and Daumé III (2012) use phonemic representa- Figure 1: Overview of Proposed Approach tion based interlingual projection for multilingual transliteration mining.…”
Section: Related Workmentioning
confidence: 87%
“…Tao [15] combines E and temporal feature T. To compute E, Tao constructs a cost matrix using string-alignment and alignment-scoring techniques based on phonological features and phonetic similarity. T builds on observation that entities that refer the same entity often have correlated frequency patterns.…”
Section: Related Workmentioning
confidence: 99%
“…Corpus latent features. We extract three new latent features from the corpus coreference, temporal distribution, and spatial distribution, as similarly explored in [15], [16]. First, we extract different entity names that are commonly referred to as, or "coreferenced," as the identical entity in documents (e.g., "Beckham" and "David Beckham").…”
Section: Introductionmentioning
confidence: 99%
“…The work we report is ongoing. We are investigating transliterations among several language pairs, and are extending these methods to Korean, Arabic, Russian and Hindi -see (Tao et al, 2006).…”
Section: Discussionmentioning
confidence: 99%