Abstract-In the context of medical document retrieval, users often under-specified queries lead to undesired search results that suffer from not containing the information they seek, inadequate domain knowledge matches and unreliable sources. To overcome the limitations of under-specified queries, we utilize tags to enhance information retrieval capabilities by expanding users' original queries with context-relevant information. We compute a set of significant tag neighbor candidates based on the neighbor frequency and weight, and utilize the most frequent and weighted neighbors to expand an entry query that has terms matching tags. The proposed approach is evaluated using MedWorm medical article collection and standard evaluation methods from the text retrieval conference (TREC). We compared the baseline of 0.353 for Mean Average Precision (MAP), reaching a MAP 0.491 (+39%) with the query expansion. In-depth analysis shows how this strategy is beneficial when compared with different ranks of the retrieval results.
I. INTRODUCTIONIn the context of medical document retrieval, users often under-specified queries lead to undesired search results that suffer from not containing the information they seek, inadequate domain knowledge matches and unreliable sources. For instance, when a user wants to search for a recent outbreak of influenza on the web, a search with the query influenza will return a list of documents containing the query term, ranked by a set of criteria defined by the search engine. In this case, at least three issues may affect the quality of the search result. One, a query with only one or two terms may be under-specified, that is, it may not contain enough terms for the search engine to retrieve the desired information to the user. Second, in the document repository of the search engine, there might exist more than hundreds of thousands articles matching the requested query. In such an amount of information, it is impossible to locate the desired information by simply browsing through all contents of returned results. The third reason is related to domain knowledge requirements. Because conventional search engines focus on generic information search, domain specific results are usually not taken into consideration during the search. Thus, a simple word based search does not produce relevant search results in specific domains such as the medical domain [1]. As a consequence of these issues related to query-based searches, only one fourth to one half of the relevant articles on a given topic are retrieved in searches performed in specific domains [2]. In other words, the sparse and incomplete query terms may