This paper proposes a query suggestion method combining two ranked retrieval methods: TF-IDF and Jaccard coefficient. Four performance criteria plus user evaluation have been adopted to evaluate this combined method in terms of ranking and relevance from different perspectives. Two experiments have been conducted using carefully designed eighty test queries which are related to eight topics. One experiment aims to evaluate the quality of the query suggestions generated by the proposed method, and the other aims to evaluate the improvement of the relevance of retuned documents in interactive web search by using the query suggestions so as to evaluate the effectiveness of the developed method. The experimental results show that the method developed in this paper is the best method for query suggestion among the methods evaluated, significantly outperforming the most popularly used TF-IDF method. In addition, the query suggestions generated by the proposed method significantly improve the relevance of returned documents in interactive web search in terms of increasing the precision or the number of highly relevant documents.
Document classification is usually more challenging than numerical data classification, because it is much more difficult to effectively represent documents than numerical data for classification purposes. Vector space model (VSM) has been widely used for document representation for classification, in which a document is represented by a vector of feature values based on a bag of words. This paper proposes a new feature for document representation under the VSM framework, class specific document frequency (CSDF), which leads to a novel term weighting scheme based on term frequency (TF), term presence (TP), and the newly proposed feature. The experimental results show that the proposed features, CSDF and TF-CSDF, effectively improve the performance of document classification in comparison with other widely used VSM document representations.
This paper investigates several state-of-the-art ranked retrieval methods, adapts and combines them as well for query suggestion. Four performance criteria plus user evaluation have been adopted to evaluate these query suggestion methods in terms of ranking and relevance from different perspectives. Extensive experiments have been conducted using carefully designed eighty test queries which are related to eight topics. The experimental results show that the method developed in this paper, which combines the TF-IDF and Jaccard coefficient methods, is the best method for query suggestion among the six methods evaluated, outperforming the most popularly used TF-IDF method. Furthermore, it is shown that re-ranking query suggestions using Cosine similarity improves the performance of query suggestions.
Web document ranking is a very challenging issue for search engines because about 80% of the search engine users are usually interested in the top three returned search results only. This paper proposes an effective method for re-ranking Google search returned web documents/pages based on document classification. This method downgrades some web documents/pages that have lower classification scores or been classified into categories irrelevant to the query. The experimental results show that the re-ranking of Google search returned web documents using document classification scores can significantly improve the ranking performance in terms of the integrated evaluation result using three criteria: MAP, nDCG, and P@20. It is evident that the proposed re-ranking method can meet the user's information need better.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.