Proceedings of the International Conference on Web Intelligence 2017
DOI: 10.1145/3106426.3106437
|View full text |Cite
|
Sign up to set email alerts
|

Navigation objects extraction for better content structure understanding

Abstract: Existing works for extracting navigation objects from webpages focus on navigation menus, so as to reveal the information architecture of the site. However, web 2.0 sites such as social networks, e-commerce portals etc. are making the understanding of the content structure in a web site increasingly di cult. Dynamic and personalized elements such as top stories, recommended list in a webpage are vital to the understanding of the dynamic nature of web 2.0 sites. To be er understand the content structure in web … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Results of their prototype evaluation displayed a 78% error free navigation element detection and a 44% error free hierarchy detection. While both [31] and [32] structure extraction methods focus on static website, authors of [33] introduce a method that captures both the static and dynamic structure of a website.…”
Section: The Prevalent Category: Websitesmentioning
confidence: 99%
“…Results of their prototype evaluation displayed a 78% error free navigation element detection and a 44% error free hierarchy detection. While both [31] and [32] structure extraction methods focus on static website, authors of [33] introduce a method that captures both the static and dynamic structure of a website.…”
Section: The Prevalent Category: Websitesmentioning
confidence: 99%
“…Dragnet [12] combines several feature sets into a single library which has demonstrated good results on several early forum extraction datasets. Another recent approach has focused on navigating the hierarchy of objects extracted from web pages (e.g., hyperlink blocks extracted from DOM trees) [13].…”
Section: Related Workmentioning
confidence: 99%
“…Dragnet 10 in turn, provided the highest precision, and depending on the metric (e.g., cleaneval, euclidean or cosine distances, etc) either Dragnet or News-Please 11 yielded the best F1 score. For our experiments, we have selected several of the systems used in Barberesi's evaluation [24] including Inscriptis, BoilerPy3 12 , jusText 13 and Dragnet. Several other tools have been initially targeted but were not included since their source code was not available online at the date the experiments were performed (e.g., Sido's forum extraction tool [25]) or due to various errors (e.g., the News-Please tool).…”
Section: B Evaluation Systemmentioning
confidence: 99%