2005
DOI: 10.1145/1089815.1089821
|View full text |Cite
|
Sign up to set email alerts
|

Support vector machines classification with a very large-scale taxonomy

Abstract: Very large-scale classification taxonomies typically have hundreds of thousands of categories, deep hierarchies, and skewed category distribution over documents. However, it is still an open question whether the state-of-the-art technologies in automated text categorization can scale to (and perform well on) such large taxonomies. In this paper, we report the first evaluation of Support Vector Machines (SVMs) in web-page classification over the full taxonomy of the Yahoo! categories. Our accomplishments includ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
179
0
2

Year Published

2007
2007
2013
2013

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 183 publications
(182 citation statements)
references
References 16 publications
1
179
0
2
Order By: Relevance
“…The true Yahoo! directory structure contains thousands of labels and is a very difficult classification problem that traditional classification methods fail to adequately handle (Liu et al 2005). However, the majority of multi-label research conducted using the Yahoo!…”
Section: Background and Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…The true Yahoo! directory structure contains thousands of labels and is a very difficult classification problem that traditional classification methods fail to adequately handle (Liu et al 2005). However, the majority of multi-label research conducted using the Yahoo!…”
Section: Background and Motivationmentioning
confidence: 99%
“…In contrast to the datasets typically utilized in research, multilabel corpora in the real world can contain thousands or tens of thousands of labels, and the label frequencies in these datasets tend to have highly skewed frequency-distributions with power-law statistics (Yang et al 2003;Liu et al 2005;Dekel and Shamir 2010). Figure 1 illustrates this point for three large real-world corpora-each containing thousands of unique labels-by plotting the number of labels within each corpus as a function of label-frequency.…”
Section: Background and Motivationmentioning
confidence: 99%
“…However, the work by [6] is among the pioneering in hierarchical classification towards addressing Web-scale directories such as Yahoo! directory consisting of over 100,000 target classes.…”
Section: Other Related Workmentioning
confidence: 99%
“…However, these approaches lead to multiple folds increase in training time as shown in [9]. Prediction speed also suffers by employing excessive flattening as studied in the work by [6] showing that the space complexity of a flat classifier is much higher than a hierarchical model. Moreover, for predicting an unseen test instance in a K class problem, one needs to evaluate O(K) classifiers in flat classification as against O(log K) classifiers in a top-down manner.…”
Section: Problem Setupmentioning
confidence: 99%
“…The current best practice on link suggestion is prefix matching over titles of Wikipedia articles, and existing document classification approaches are not proper for the category suggestion task due to their poor effectiveness and efficiency when dealing with large-scale category systems [27].…”
Section: Introductionmentioning
confidence: 99%