Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval - SIGIR 2003
DOI: 10.1145/860454.860455
|View full text |Cite
|
Sign up to set email alerts
|

A scalability analysis of classifiers in text categorization

Abstract: Real-world applications of text categorization often require a system to deal with tens of thousands of categories defined over a large taxonomy. This paper addresses the problem with respect to a set of popular algorithms in text categorization, including Support Vector Machines, k-nearest neighbor, ridge regression, linear least square fit and logistic regression. By providing a formal analysis of the computational complexity of each classification method, followed by an investigation on the usage of differe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
77
0
2

Year Published

2005
2005
2011
2011

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(80 citation statements)
references
References 5 publications
1
77
0
2
Order By: Relevance
“…-Very good effectiveness, as shown in several text classification experiments [6][7][8][9]; this effectiveness is often due to their natural ability to deal with non-linearly separable classes; -The fact that they scale extremely well (better than SVMs) to very high numbers of classes [9]. In fact, computing the |T r| distance scores and sorting them in descending order (as from Step 1) needs to be performed only once, irrespectively of the number m of classes involved; this means that distance-weighted k-NN scales (wildly) sublinearly with the number of classes involved, while learning methods that generate linear classifiers scale linearly, since none of the computation needed for generating a single classifierΦ ′ can be reused for the generation of another classifierΦ ′′ , even if the same training set T r is involved.…”
Section: (Similarly To Equation 1) Identify the Setmentioning
confidence: 89%
“…-Very good effectiveness, as shown in several text classification experiments [6][7][8][9]; this effectiveness is often due to their natural ability to deal with non-linearly separable classes; -The fact that they scale extremely well (better than SVMs) to very high numbers of classes [9]. In fact, computing the |T r| distance scores and sorting them in descending order (as from Step 1) needs to be performed only once, irrespectively of the number m of classes involved; this means that distance-weighted k-NN scales (wildly) sublinearly with the number of classes involved, while learning methods that generate linear classifiers scale linearly, since none of the computation needed for generating a single classifierΦ ′ can be reused for the generation of another classifierΦ ′′ , even if the same training set T r is involved.…”
Section: (Similarly To Equation 1) Identify the Setmentioning
confidence: 89%
“…In contrast to the datasets typically utilized in research, multilabel corpora in the real world can contain thousands or tens of thousands of labels, and the label frequencies in these datasets tend to have highly skewed frequency-distributions with power-law statistics (Yang et al 2003;Liu et al 2005;Dekel and Shamir 2010). Figure 1 illustrates this point for three large real-world corpora-each containing thousands of unique labels-by plotting the number of labels within each corpus as a function of label-frequency.…”
Section: Background and Motivationmentioning
confidence: 99%
“…For instance, the ''shrinkage'' method presented in McCallum et al (1998) is aimed at improving parameter estimation for data-sparse leaf categories in a 1-ofn HTC system based on a naïve Bayesian method; the underlying intuitions are specific to naïve Bayesian methods, and do not easily carry over to other contexts. Incidentally, the naïve Bayesian approach seems to have been the most popular among HTC researchers, since several other HTC models are hierarchical variations of naïve Bayesian learning algorithms (Chakrabarti et al 1998;Gaussier et al 2002;Toutanova et al 2001;Vinokourov and Girolami 2002); SVMs have also recently gained popularity in this respect (Cai and Hofmann 2004;Dumais and Chen 2000;Liu et al 2005;Yang et al 2003).…”
Section: Related Workmentioning
confidence: 99%
“…Many of these intuitions have been used in close association with a specific learning algorithm; the most popular choices in this respect have been naïve Bayesian methods (Chakrabarti et al 1998;Gaussier et al 2002;Koller and Sahami 1997;McCallum et al 1998;Toutanova et al 2001;Vinokourov and Girolami 2002), neural networks (Ruiz and Srinivasan 2002;Weigend et al 1999;Wiener et al 1995), support vector machines (Cai and Hofmann 2004;Dumais and Chen 2000;Liu et al 2005;Yang et al 2003), and example-based classifiers (Yang et al 2003).…”
mentioning
confidence: 99%