Abstract. The focus of our study is zero-hit queries in keyword subject searches and the effort of increasing recall in these cases by reformulating and, then, expanding the initial queries using an external source of knowledge, namely a thesaurus. To this end, the objectives of this study are twofold. First, we perform the mapping of query terms to the thesaurus terms. Second, we use the matched terms to expand the user's initial query by taking advantage of the thesaurus relations and implementing natural language processing (NLP) techniques. We report on the overall procedure and elaborate on key points and considerations of each step of the process.Keywords: Query expansion, Thesaurus, Zero-hit queries, NLP techniques
IntroductionThe focus of our study is zero-hit queries in keyword subject searches in an effort to increase recall by reformulating and, then, expanding the initial queries using an external source of knowledge, namely a thesaurus, and taking advantage of natural language processing (NLP) techniques. In case of zero-hit queries query expansion methods based on sets of retrieved results (implicit relevance feedback) cannot be implemented. Building on this fact, we chose to use a handmade thesaurus to expand the initial queries taking advantage of the relations identified within a thesaurus' structure without letting the users interfere in the process. In order to proceed to query expansion we first allocate an entry point within the knowledge base, i.e. match the initial queries to a term from the thesaurus. Exact string matching is unlikely to be successful in highly inflectional languages, like Greek, because of the various forms a word can take. Additionally, research has shown that typing errors are also responsible of delivering zero-hit queries [1]. To overcome the identified obstacles we used techniques for natural language processing, namely spelling, lemmatizing, removal of stop words, accent and case processing. The database and the thesaurus underwent the same processing where needed. Finally, we derived candidate expansion terms by considering the related, parallel, narrower and broader terms of the allocated entry point in the thesaurus moving one level towards each direction.The remaining sections elaborate on the overall procedure and report on key points and considerations of each step of the process.