Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005
DOI: 10.1145/1076034.1076124
|View full text |Cite
|
Sign up to set email alerts
|

Bootstrapping dictionaries for cross-language information retrieval

Abstract: The bottleneck for dictionary-based cross-language information retrieval is the lack of comprehensive dictionaries, in particular for many different languages. We here introduce a methodology by which multilingual dictionaries (for Spanish and Swedish) emerge automatically from simple seed lexicons. These seed lexicons are automatically generated, by cognate mapping, from (previously manually constructed) Portuguese and German as well as English sources. Lexical and semantic hypotheses are then validated and n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2007
2007
2018
2018

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 19 publications
0
12
0
Order By: Relevance
“…A variety of distributional models have been used for this task, including Latent Semantic Analysis [Dumais et al 1996] and topic models [De Smet and Moens 2009]. Query translation has been shown to contribute to information retrieval in related languages (translations from Portuguese, German, Spanish and Swedish to English in Markó et al [2005]) as well as unrelated languages (from Japanese to English in Sadat et al [2003]). …”
Section: Related Workmentioning
confidence: 99%
“…A variety of distributional models have been used for this task, including Latent Semantic Analysis [Dumais et al 1996] and topic models [De Smet and Moens 2009]. Query translation has been shown to contribute to information retrieval in related languages (translations from Portuguese, German, Spanish and Swedish to English in Markó et al [2005]) as well as unrelated languages (from Japanese to English in Sadat et al [2003]). …”
Section: Related Workmentioning
confidence: 99%
“…Finally, from a more practical point of view, we believe it would be interesting to use n-gram based translation for supporting the generation process of multilingual thesaurus for technical domains, as in the case of MorphoSaurus (Schulz et al, 2006) in Medicine, and its application to CLIR tasks (Markó et al, 2005). Twitter and other microblogging services will deserve special attention since it is a very noisy multilingual environment, for which specialized linguistic resources are still very scarce, particularly for nonEnglish languages.…”
Section: Discussionmentioning
confidence: 99%
“…The MorphoSaurus system [4,5,10,11] maps the content of domain-specific texts to a concept-like interlingua. This entails a simplification and standardization of document source and user queries in order to facilitate the retrieval of documents in multilingual collections.…”
Section: The Morphosaurus System 21 Subwords As Atomic Meaning Identmentioning
confidence: 99%
“…Moreover, spelling and syntax rules are not always respected. Due to these peculiarities, current information retrieval approaches that are usually based on simple comparison of entire words are inappropriate because they produce results that are incomplete, inaccurate, or outside the desired scope [4].…”
Section: Introductionmentioning
confidence: 99%