In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techniques which have decision rules that are derived via explicit error minimization linear discriminant analysis, logistic regression, and neuraf networks. We demonstrate that the classifiers perform 10-15% better than relevance feedback via Rocchio expansion for the TREC-2 and TREC-3 routing tasks.Error minimization is difficult in high-dimensional feature spaces because the convergence process is slow and the models~e prone to overfitting.We use two different strategies, latent semantic indexing and optimaJ term selection, to reduce the number of features.Our results indicate that features based on latent semantic indexing are more effective for techniques such as linear discriminant anafysis and logistic regression, which have no way to protect against overfitting.Neural networks perform equally well with either set of features and can take advantage of the additional information available when both feature sets are used as input.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.