2022
DOI: 10.1145/3480966
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Web Page Segmentation Using Self-organized Multi-objective Clustering

Abstract: Web page segmentation (WPS) aims to break a web page into different segments with coherent intra- and inter-semantics. By evidencing the morpho-dispositional semantics of a web page, WPS has traditionally been used to demarcate informative from non-informative content, but it has also evidenced its key role within the context of non-linear access to web information for visually impaired people. For that purpose, a great deal of ad hoc solutions have been proposed that rely on visual, logical, and/or text cues.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 67 publications
0
2
0
Order By: Relevance
“…In addition, the HEPS method [20], mentioned in the Webis-WebSeg-20 dataset [16] for comparison, utilizes text nodes and images to identify potential headings, corresponding blocks, and create a hierarchical segmentation. The DOM structure is also a vital component in other segmentation models [14,15], where additional factors like textual and visual cues are integrated to enhance performance.…”
Section: Wps Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, the HEPS method [20], mentioned in the Webis-WebSeg-20 dataset [16] for comparison, utilizes text nodes and images to identify potential headings, corresponding blocks, and create a hierarchical segmentation. The DOM structure is also a vital component in other segmentation models [14,15], where additional factors like textual and visual cues are integrated to enhance performance.…”
Section: Wps Approachesmentioning
confidence: 99%
“…Over time, many solutions have been proposed to address the segmentation problem using different approaches and learning strategies. The most commonly used techniques fall into several categories: ad-hoc approaches [7,29,6,18,25] (which rely on manually-tuned heuristics and parameter-dependent methods), theoretically-founded approaches [9,1] (based on graph-theoretic and classical clustering algorithms), computer vision approaches [13,11], and others (as mentioned in [14]). In general, these approaches share three key elements: visual, textual, and structural cues found on web pages.…”
Section: Introductionmentioning
confidence: 99%
“…In comparison to Andrew Judith et al solution [37], our method defines as many content blocks as there are on the page, not limiting the number of blocks. In comparison to other segment number not fixed solutions [38], this method is faster, as it does not require two stages (to identify the number of clusters and then to divide the web page into this number of blocks) and extracts all possible content blocks from the web page. The blocks are not limited to text containing structured blocks only [39] and extract all, not only structured blocks [40].…”
mentioning
confidence: 99%