In many real-world learning tasks it is expensive to acquire a su cient n umber of labeled examples for training. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information.Our work follows on previous research on Query By Committee, and extends the committee-based paradigm to the context of probabilistic classi cation. We describe a family of empirical methods for committee-based sample selection in probabilistic classication models, which e v aluate the informativeness of an example by measuring the degree of disagreement b e t ween several model variants. These variants the committee are drawn randomly from a probability distribution conditioned by the training set labeled so far.The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We nd that all variants of the method achieve a signi cant reduction in annotation cost, although their computational e ciency di ers. In particular, the simplest variant , a t wo member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a signi cant reduction in the size of the model used by the tagger.
In this paper, we address the tradeoff between exploration and exploitation for agents which need to learn more about the structure of their environment in order to perform more effectively. For example, a software agent operating on the World Wide Web may need to learn which sites on the net are most useful, and the most efficient routes to those sites. We compare exploration strategies for a repeated task, where the agent is given some particular task to perform some number of times. Tasks are modeled as navigation on a partially known (deterministic) graph. This paper describes a new utilitybased exploration algorithm for repeated tasks which interleaves exploration with task performance. The method takes into account both the costs and the potential benefits (for future task repetitions) of different exploratory actions. Exploration is performed in a greedy fashion, with the locally optimal exploratory action performed during repetition of each task. We experimentally evaluated our utility-based interleaved exploration algorithm against a heuristic search algorithm for exploration before task performance (a priori exploration) as well as a randomized interleaved exploration algorithm. We found that for a single repeated task, utility-based interleaved exploration consistently outperforms the alternatives, unless the number of task repetitions is very high. In addition, we extended the algorithms for the case of multiple repeated tasks, where the agent has a different, randomly-chosen task (from a known subset of possible tasks) to perform each time. Here too, we found that utility-based interleaved exploration is clear in most cases. Int. J. Patt. Recogn. Artif. Intell. 1999.13:963-986. Downloaded from www.worldscientific.com by FLINDERS UNIVERSITY LIBRARY on 02/03/15. For personal use only.
Recognizing shallow linguistic patterns, such as basic syntactic relationships between words, is a common task in applied natural language and text processing. The common practice for approaching this task is by tedious manual definition of possible pattern structures, often in the form of regular expressions or finite automata. This paper presents a novel memorybased learning method that recognizes shallow patterns in new text based on a bracketed training corpus. The examples are stored as-is, in efficient data structures. Generalization is performed on-line at recognition time by comparing subsequences of the new text to positive and negative evidence in the corpus. This way, no information in the training is lost, as can happen in other learning systems that construct a single generalized model at the time of training. The paper presents experimental results for recognizing noun phrase, subject-verb and verb-object patterns in English.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations –citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.