We present a statistical model of feature occurrence over time, and develop tests based on classical hypothesis testing for significance of term appearance on a given date. Using additional classical hypothesis testing we are able to combine these terms to generate "topics" as defined by the Topic Detection and Tracking study. The groupings of terms obtained can be used to automatically generate an interactive timeline displaying the major events and topics covered by the corpus. To test the validity of our technique we extracted a large number of these topics from a test corpus and had human evaluators judge how well the selected features captured the gist of the topics, and how they overlapped with a set of known topics from the corpus. The resulting topics were highly rated by evaluators who compared them to known topics.
We propose a simple statistical model for the frequency of occurrence of features in a stream of text. Adoption of this model allows us to use classical significance tests to filter the stream for interesting events. We tested the model by building a system and running it on a news corpus. By a subjective evaluation, the system worked remarkably well: almost all of the groups of identified tokens corresponded to news stories and were appropriately placed in time. A preliminary objective evaluation was also used to measure the quality of the system and it showed some of the weaknesses and the power of our approach.
We built two Information Retrieval systems that were targeted for the TREC-6 "aspect oriented" retrieval track. The systems were built to test the usefulness of different visualizations in an interactive IR setting-in particular, an "aspect window" for the chosen task, and a 3-D visualization of document inter-relationships. We studied 24 users of the system in order to investigate: whether the systems were more effective than a control system, whether experienced users outperformed novices, whether spatial reasoning ability was a good predictor of effective use of 3-D, and whether the systems could be compared indirectly via a control system. Our results show substantial differences in user performance are related to spatial reasoning ability and to a lesser degree other traits. We also obtained markedly different results from the direct and indirect comparisons. IntroductionWe are interested in building and evaluating high quality information retrieval and organization tools. We believe that effective use of such tools may require talented users or significant amounts of training. There are many settings where experts in the field are required to spend time learning a tool-eg., CAD/CAM applications, statistical analysis packages-and the gains from learning the system more'than outweigh the time spent learning it. Novice users may iind such systems puzzling, but we do not feel that diminishes the value of a targeted system. Further, other researchers are investigating the usefulness of systems for users with little to no searching experience[23, 31.On the other hand, we have no interest in building systems that are inherently difficult to use. Indeed, the better and easier to use a system's underlying design is, the more complexity we can introduce without overburdening the user [21]. For that reason, we are interested in basic issues in interactive computing, among them: how effective are simple system features, how can we compare our various systems, and are there any measures we can Interactive Track, an evaluation of 'aspect oriented information retrieval," wherein users are tasked with identifying as many %s-pects" of relevance to a query as they can. For example, in a query about ferry sinkings in the news, the task was to find a list of all ferries that sank, not to find all documents about ferry sinkings. The structure of our experiments was determined to a large extent by the TREC-6 guidelines; they are explained in more detail below. Because of our interests in targeted systems, we chose to build and evaluate a system that was designed specifically to aid a user with aspect retrieval. The alternative would have been to use a vanilla search engineperhaps slightly enhanced to look at some specific search technique-for the task; we felt that approach would not sufficiently address our interests. At the same time, we have been investigating 3-D visualizations of document relatedness (clustering), so we chose to create a slightly enhanced version of our system that included a 3-D visualization.The questions...
We are interested in how ideas from document clustering can be used to improve the retrieval accuracy of ranked lists in interactive systems. In particular, we are interested in ways to evaluate the eectiveness of such systems to decide how they might best be constructed. In this study, we construct and evaluate systems that present the user with ranked lists and a visualization of inter-document similarities. We ®rst carry out a user study to evaluate the clustering/ranked list combination on instance-oriented retrieval, the task of the TREC-6 Interactive Track. We ®nd that although users generally prefer the combination, they are not able to use it to improve eectiveness. In the second half of this study, we develop and evaluate an approach that more directly combines the ranked list with information from inter-document similarities. Using the TREC collections and relevance judgments, we show that it is possible to realize substantial improvements in eectiveness by doing so, and that although users can use the combined information eectively, the system can provide hints that substantially improve on the user's solo eort. The resulting approach shares much in common with an interactive application of incremental relevance feedback. Throughout this study, we illustrate our work using two prototype systems constructed for these evaluations. The ®rst, AspInQuery, is a classic information retrieval system augmented with a specialized tool for recording information about instances of relevance. The other system, Lighthouse, is a Web-based application that combines a ranked list with a portrayal of inter-document similarity. Lighthouse can work with collections such as TREC, as well as the results of Web search engines. Ó
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.