Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management 2020
DOI: 10.1145/3340531.3412782
|View full text |Cite
|
Sign up to set email alerts
|

Web Page Segmentation Revisited

Abstract: Each web page can be segmented into semantically coherent units that fulfill specific purposes. Though the task of automatic web page segmentation was introduced two decades ago, along with several applications in web content analysis, its foundations are still lacking. Specifically, the developed evaluation methods and datasets presume a certain downstream task, which led to a variety of incompatible datasets and evaluation methods. To address this shortcoming, we contribute two resources: (1) An evaluation f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 33 publications
0
11
0
Order By: Relevance
“…We conduct a comprehensive assessment of our method including both quantitative and qualitative analyses. In the quantitative aspect, we evaluate our approach using a dataset comprising 1,969 web pages sourced from the Webis-WebSeg-20 dataset [16]. Our results demonstrate a notable performance, with our method achieving a 64% F B 3 score, surpassing VIPS and MMDetection by 19% and 10%, respectively, on the same dataset.…”
Section: Introductionmentioning
confidence: 92%
See 3 more Smart Citations
“…We conduct a comprehensive assessment of our method including both quantitative and qualitative analyses. In the quantitative aspect, we evaluate our approach using a dataset comprising 1,969 web pages sourced from the Webis-WebSeg-20 dataset [16]. Our results demonstrate a notable performance, with our method achieving a 64% F B 3 score, surpassing VIPS and MMDetection by 19% and 10%, respectively, on the same dataset.…”
Section: Introductionmentioning
confidence: 92%
“…The optimized segmentation layout is more comprehensible and reduces redundant zones in each section, as shown in figure 7b. We conduct our evaluation on the Webis-WebSeg-20 [16] dataset, which is the largest publicly accessible set of data in terms of web segmentation. The dataset contains more than 8,000 web pages from more than 5,500 different domains.…”
Section: Visual Optimizationmentioning
confidence: 99%
See 2 more Smart Citations
“…Efficiently representing real-world websites is a longstanding challenge in web understanding (Wu et al, 2023), including subtasks like web information extraction (Chang et al, 2006) and web segmentation (Kiesel et al, 2020).…”
Section: Website Representationsmentioning
confidence: 99%