The present research studies the impact of decompounding and two different word normalization methods, stemming and lemmatization, on monolingual and bilingual retrieval. The languages in the monolingual runs are English, Finnish, German and Swedish. The source language of the bilingual runs is English, and the target languages are Finnish, German and Swedish. In the monolingual runs, retrieval in a lemmatized compound index gives almost as good results as retrieval in a decompounded index, but in the bilingual runs differences are found: retrieval in a lemmatized decompounded index performs better than retrieval in a lemmatized compound index. The reason for the poorer performance of indexes without decompounding in bilingual retrieval is the difference between the source language and target languages: phrases are used in English, while compounds are used instead of phrases in Finnish, German and Swedish. No remarkable performance differences could be found between stemming and lemmatization.
PurposeThe aim of the current paper is to test whether query translation is beneficial in web retrieval.Design/methodology/approachThe language pairs were Finnish‐Swedish, English‐German and Finnish‐French. A total of 12‐18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary‐based system. In English‐German, also machine translation was utilized. The author used Google as the search engine.FindingsThe results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query‐translation were better than in the traditional laboratory tests.Originality/valueThis research shows that query translation in web is beneficial especially for users with moderate and non‐active language skills. This is valuable information for developers of cross‐language information retrieval systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.