No abstract
We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than languagespecific words or characters, we can analyze text in many languages with a single model. Due to the small vocabulary size, these multilingual models are very compact, but produce results similar to or better than the state-ofthe-art in Part-of-Speech tagging and Named Entity Recognition that use only the provided training datasets (no external data sources). Our models are learning "from scratch" in that they do not rely on any elements of the standard pipeline in Natural Language Processing (including tokenization), and thus can run in standalone fashion on raw text.
A b s t r a c t . We describe an experimental study of pruning methods for decision tree classifiers when the goal is minimizing l o s s rather than e rror. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical comparison of these methods and evaluate them with respect to loss. We found that applying the Laplace correction to estimate the probability distributions at the leaves was beneficial to all pruning methods. Unlike in error minimization, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in terms of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of ten. While no method dominated others on all datasets, even for the same domain different pruning mechanisms are better for different loss matrices. 1 P r u n i n g D e c i s i o n T r e e s Decision trees are a widely used symbolic modeling technique for classification tasks in machine learning. The most common approach to constructing decision tree classifiers is to grow a full tree and prune it back. Pruning is desirable because the tree that is grown may overfit the data by inferring more structure than is justified by the training set. Specifically, if there are no conflicting instances, the training set error of a fully built tree is zero, while the true error is likely to be larger. To combat this overfitting problem, the tree is pruned back with the goal of identifying the tree with the lowest error rate on previously unobserved instances, breaking ties in favor of smaller trees (Breiman, Friedman, Olshen ~c Stone 1984, Quinlan 1993. Several pruning methods have been introduced in the literature, including cost-complexity pruning, reduced error pruning, pessimistic pruning, error-based pruning, penalty pruning, and MDL pruning. Historically, most pruning algorithms have been developed to minimize the expected e r r o r r a t e of the decision tree, assuming that classification errors have the same unit cost.
T T he invention of the movable type printing press launched the information age by making the mass distribution of information both feasible and economical. Newspapers, magazines, shopping catalogs, restaurant guides, and classified advertisements can trace their origins to the printing process. Five and a half centuries of technological progress in communications networks, protocols, computers, and user interface design led to the Web, online publishing, and e-commerce. Consumers and businesses have access to vast stores of information. All this information, however, used to be accessible only while users were tethered to a computer at home or in an office. Wireless data and voice access to this vast store allows unprecedented access to information from any location at any time.
We discuss the types of noise that may occur in relational learning systems and describe two approaches to addressing noise in a relational concept learning algorithm. We then evaluate each approach expximentally.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.