2014 25th International Workshop on Database and Expert Systems Applications 2014
DOI: 10.1109/dexa.2014.56
|View full text |Cite
|
Sign up to set email alerts
|

A Pure URL-Based Genre Classification of Web Pages

Abstract: In this paper, we propose a new approach for multi-label genre classification of web pages that exploits character n-grams extracted from the URL of the web page rather than its content. Using only the URL reduces the time needed for feature extraction since it does not need to download the content of the web page. Our approach deals with the complexity of web pages because it uses a multi-label classification where each web page can be assigned to more than one genre. Moreover, our approach implements a new w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…The URL attribute refers to some simple properties of the URL on the structure and content. It mainly contains three key parts below [6]. Firstly, the URL structure attribute is formatted by the structure "protocol://host:port/path?parameter#infomation-fragment".…”
Section: Url Feature Selectionmentioning
confidence: 99%
“…The URL attribute refers to some simple properties of the URL on the structure and content. It mainly contains three key parts below [6]. Firstly, the URL structure attribute is formatted by the structure "protocol://host:port/path?parameter#infomation-fragment".…”
Section: Url Feature Selectionmentioning
confidence: 99%
“…Most previous studies in WGI consider the simple case where all web pages should belong to a predefined taxonomy of genres (Lim, 2005;Santini, 2007;Kanaris and Stamatatos, 2009;Jebari, 2014). This is known as closed-set classification.…”
Section: Closed-set Vs Open-set Classificationmentioning
confidence: 99%
“…In some cases, it has been reported that the web-pages's URL alone is sufficient for predicting its genre (Abramson and Aha, 2012;Jebari, 2014;Priyatam et al, 2013;Zhu, Zhou, and Fung, 2011). Concerning available hyperlinks in web-pages there are two parts than can provide useful information: the URL of the hyperlink itself handled as a string of characters and its anchor text.…”
Section: Representation Of Web-pagesmentioning
confidence: 99%
See 2 more Smart Citations