Abstract. This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it compares standard multinomial naive Bayes to the recently proposed transformed weight-normalized complement naive Bayes classifier (TWCNB) [1], and shows that some of the modifications included in TWCNB may not be necessary to achieve optimum performance on some datasets. However, it does show that TFIDF conversion and document length normalization are important. It also shows that support vector machines can, in fact, sometimes very significantly outperform both methods. Finally, it shows how the performance of multinomial naive Bayes can be improved using locally weighted learning. However, the overall conclusion of our paper is that support vector machines are still the method of choice if the aim is to maximize accuracy.
Abstract. Nearest neighbour search (NNS) is an old problem that is of practical importance in a number of fields. It involves finding, for a given point q, called the query, one or more points from a given set of points that are nearest to the query q. Since the initial inception of the problem a great number of algorithms and techniques have been proposed for its solution. However, it remains the case that many of the proposed algorithms have not been compared against each other on a wide variety of datasets. This research attempts to fill this gap to some extent by presenting a detailed empirical comparison of three prominent data structures for exact NNS: KD-Trees, Metric Trees, and Cover Trees. Our results suggest that there is generally little gain in using Metric Trees or Cover Trees instead of KD-Trees for the standard NNS problem.
Sponsors Challenges• Deciding whether a pattern is subgraph isomorphic to G itself is NP-Complete (naïve solution is O(n k ))• How to measure frequency?• Main focus was to tackle computational complexity of the problem Results• Recent advances in parameterized complexity theory, give us (randmized) algebraic methods that for tree patterns, reduce subgraph isomorphism from O(n • Accuracy can be boosted arbitrarily by repeating subgraph isomorphism multiple times• Exponential factors are relatively small, to allow for practical applications Method OverviewEvaluate over
Sponsors Challenges• Deciding whether a pattern is subgraph isomorphic to G itself is NP-Complete (naïve solution is O(n k ))• How to measure frequency?• Main focus was to tackle computational complexity of the problem Results• Recent advances in parameterized complexity theory, give us (randmized) algebraic methods that for tree patterns, reduce subgraph isomorphism from O(n • Accuracy can be boosted arbitrarily by repeating subgraph isomorphism multiple times• Exponential factors are relatively small, to allow for practical applications Method OverviewEvaluate over
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.