Organizing Web search results into a hierarchy of topics and subtopics facilitates browsing the collection and locating results of interest. In this paper, we propose a new hierarchical monothetic clustering algorithm to build a topic hierarchy for a collection of search results retrieved in response to a query. At every level of the hierarchy, the new algorithm progressively identifies topics in a way that maximizes the coverage while maintaining distinctiveness of the topics. We refer the proposed algorithm to as DisCover.Evaluating the quality of a topic hierarchy is a non-trivial task, the ultimate test being user judgment. We use several objective measures such as coverage and reach time for an empirical comparison of the proposed algorithm with two other monothetic clustering algorithms to demonstrate its superiority. Even though our algorithm is slightly more computationally intensive than one of the algorithms, it generates better hierarchies. Our user studies also show that the proposed algorithm is superior to the other algorithms as a summarizing and browsing tool.
Dimensionality reduction is the process of mapping high-dimension patterns to a lower dimension subspace. When done prior to classification, estimates obtained in the lower dimension subspace are more reliable. For some classifiers, there is also an improvement in performance due to the removal of the diluting effect of redundant information. A majority of the present approaches to dimensionality reduction are based on scatter matrices or other statistics of the data which do not directly correlate to classification accuracy. The optimality criteria of choice for the purposes of classification is the Bayes error. Usually however, Bayes error is difficult to express analytically. We propose an optimality criteria based on an approximation of the Bayes error and use it to formulate a linear and a nonlinear method of dimensionality reduction. The nonlinear method we propose, relies on using a multilayered perceptron which produces as output the lower dimensional representation. It thus differs from autoassociative like multilayered perceptrons which have been proposed and used for dimensionality reduction. Our results show that the nonlinear method is, as anticipated, superior to the linear method in that it can perform unfolding of a nonlinear manifold. In addition, the nonlinear method we propose provides substantially better lower dimension representation (for classification purposes) than Fisher's linear discriminant (FLD) and two other nonlinear methods of dimensionality reduction that are often used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.