This work presents an information retrieval model developed to deal with hyperlinked environments. The model is based on belief networks and provides a framework for combining information extracted from the content of the documents with information derived from cross-references among the documents. The information extracted from the content of the documents is based on statistics regarding the keywords in the collection and is one of the basis for traditional information retrieval (IR) ranking algorithms. The information derived from crossreferences among the documents is based on link references in a hyperlinked environment and has received increased attention lately due to the success of the Web. We discuss a set of strategies for combining these two types of sources of evidential information and experiment with them using a reference collection extracted from the Web. The results show that this type of combination can improve the retrieval performance without requiring any extra information from the users at query time. In our experiments, the improvements reach up to 59% in terms of average precision figures.
Information derived from the cross-references among the documents in a hyperlinked environment, usually referred to as link information, is considered important since it can be used to effectively improve document retrieval. Depending on the retrieval strategy, link information can be local or global. Local link information is derived from the set of documents returned as answers to the current user query. Global link information is derived from all the documents in the collection. In this work, we investigate how the use of local link information compares to the use of global link information. For the comparison, we run a series of experiments using a large document collection extracted from the Web. For our reference collection, the results indicate that the use of local link information improves precision by 74%. When global link information is used, precision improves by 35%. However, when only the first 10 documents in the ranking are considered, the average gain in precision obtained with the use of global link information is higher than the gain obtained with the use of local link information. This is an interesting result since it provides insight and justification for the use of global link information in major Web search engines, where users are mostly interested in the first 10 answers. Furthermore, global information can be computed in the background, which allows speeding up query processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.