2011
DOI: 10.1145/1993053.1993057
|View full text |Cite
|
Sign up to set email alerts
|

A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification

Abstract: Given only the URL of a Web page, can we identify its topic? We study this problem in detail by exploring a large number of different feature sets and algorithms on several datasets. We also show that the inherent overlap between topics and the sparsity of the information in URLs makes this a very challenging problem. Web page classification without a page’s content is desirable when the content is not available at all, when a classification is needed before obtaining the content, or wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

2
50
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 59 publications
(52 citation statements)
references
References 27 publications
2
50
0
Order By: Relevance
“…URL classification problem is studied by many researchers (Kan, 2004;Kan and Thi, 2005;Baykan et al, 2011;Rajalakshmi and Aravindan, 2011;Singh et al, 2012) and various URL features are suggested in the Science Publications JCS literature. Kan and Thi (2005) suggested segmentation techniques for extracting features from URLs.…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…URL classification problem is studied by many researchers (Kan, 2004;Kan and Thi, 2005;Baykan et al, 2011;Rajalakshmi and Aravindan, 2011;Singh et al, 2012) and various URL features are suggested in the Science Publications JCS literature. Kan and Thi (2005) suggested segmentation techniques for extracting features from URLs.…”
Section: Introductionmentioning
confidence: 99%
“…Kan and Thi (2005) suggested segmentation techniques for extracting features from URLs. Token based features are suggested in Baykan et al (2011);Rajalakshmi and Aravindan (2011). The n-gram based approach for URL classification is discussed in (Jianping et al, 2006;Baykan et al, 2011;Rajalakshmi and Aravindan, 2013).…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations