Proceedings of the 37th International ACM SIGIR Conference on Research &Amp; Development in Information Retrieval 2014
DOI: 10.1145/2600428.2609622
|View full text |Cite
|
Sign up to set email alerts
|

Query expansion for mixed-script information retrieval

Abstract: For many languages that use non-Roman based indigenous scripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or multi-lingual space with more than one script which we refer to as the Mixed-Script space. IR in the mixed-script space is challenging because queries written in either the native or the Roman script need to be matched to the documents written in both the scripts. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
50
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 67 publications
(51 citation statements)
references
References 28 publications
1
50
0
Order By: Relevance
“…While it is true that प्रणब मु खज is the president of India and was born in प म बं गाल, it is more useful for those who do not read the Devanagari script to be presented with the information that Indian president Pranab Mukherjee was born in West Bengal. Hence, much work in transliteration is focused on translation and information retrieval (Knight and Graehl, 1998;Chen et al, 1998;Virga and Khudanpur, 2003;Haizhou et al, 2004;Gupta et al, 2014). Knight and Graehl (1998) took pronunciation as a mediating variable in mapping between the two writing systems, and explicitly modeled grapheme-to-phoneme and cross-lingual pronunciation mapping in their model.…”
Section: Romanization and Transliterationmentioning
confidence: 99%
“…While it is true that प्रणब मु खज is the president of India and was born in प म बं गाल, it is more useful for those who do not read the Devanagari script to be presented with the information that Indian president Pranab Mukherjee was born in West Bengal. Hence, much work in transliteration is focused on translation and information retrieval (Knight and Graehl, 1998;Chen et al, 1998;Virga and Khudanpur, 2003;Haizhou et al, 2004;Gupta et al, 2014). Knight and Graehl (1998) took pronunciation as a mediating variable in mapping between the two writing systems, and explicitly modeled grapheme-to-phoneme and cross-lingual pronunciation mapping in their model.…”
Section: Romanization and Transliterationmentioning
confidence: 99%
“…The goal of query expansion in this regard is by increasing recall, precision can potentially increase (rather than decrease as mathematically equated), by including in the result set pages which are more relevant [7]. In the query expansion, related words are added to user's original query for the purpose of forming a longer and more precise query to express user's retrieval intentions.…”
Section: Introductionmentioning
confidence: 99%
“…Different approach has different sources for finding related words [8]. Typical sources of query expansion terms are pseudo relevant documents [4]or external static resources, such as click through data [9,10], Wikipedia [11,12]or ConceptNet [8,13] WordNet [14,15], HowNet [7]. For example, Dalton et al [5] did query expansion using entity names, aliases and categories with several methods of linking entities to the query.…”
Section: Introductionmentioning
confidence: 99%
“…These models and systems ignore the positional relationships among index terms and therefore, will neglect any information derived from positions of index terms such as order or proximity [2,3]. However, this information could be important in judging relevance between queries and documents, or among documents.…”
Section: Introductionmentioning
confidence: 99%