Supervised learning approaches to text classification are in practice often required to work with small and unsystematically collected training sets. The alternative to supervised learning is usually viewed to be building classifiers by hand, using a domain expert's understanding of which features of the text are related to the class of interest. This is expensive, requires a degree of sophistication about linguistics and classification, and makes it difficult to use combinations of weak predictors. We propose instead combining domain knowledge with training examples in a Bayesian framework. Domain knowledge is used to specify a prior distribution for the parameters of a logistic regression model, and labeled training data is used to produce a posterior distribution, whose mode we take as the final classifier. We show on three text categorization data sets that this approach can rescue what would otherwise be disastrously bad training situations, producing much more effective classifiers.
A new software system allows Web searchers to connect with others who have been there, done that. W hat if your navigation of the Web were assisted not only by various search engines and software systems, but also by many other people who had searched for similar information in the recent past? This article discusses a software system called "AntWorld"' that integrates humans as part of the enabling technology to help other humans navigate the Web.I^v^^?^H As a resource the Web is amazing idtchen, who always leave chemical trails for their V^ 1 and bewildering, and, at times, infu-nestmates when they find something good to eat.Hating. All of us have, at one time or Pursuing the ant metaphor, we imagine a user comanothcr, followed a seemingly endless niunity operating in asynchronous collabonition mode, loop, hopefully clicking one more where information trails from user quests for informatime in a quest for some specific tion on the Internet are left behind for any community information. Many of tis were also member to follow. The goal is to post and share comnot che first person ever to be frustrated searching for munal knowledge: ;is community members engage in that particular information. But the Web does not individual information quests, they make a small extra (yet) learn from other people's mistakes. In that sense, ŵ e who use ir are not even as clever as ants in the
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.