2020
DOI: 10.1007/978-981-15-6168-9_17
|View full text |Cite
|
Sign up to set email alerts
|

Model-Driven Web Page Segmentation for Non Visual Access

Abstract: Web page segmentation aims to break a large page into smaller blocks, in which contents with coherent semantics are kept together. Within this context, a great deal of approaches have been proposed without any specific end task in mind. In this paper, we study different segmentation strategies for the task of non visual skimming. For that purpose, we propose to segment web pages into visually coherent zones so that each zone can be represented by a set of relevant keywords that can be further synthesized into … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…The content can be divided horizontally with <HR> tag. These features were used to distinguish areas on the web page rather than extracting the main content directly [3], [18], [37]. Considering some tags give a implicit information about the content, recent methods use these features to preprocess the web page.…”
Section: ) Featuresmentioning
confidence: 99%
“…The content can be divided horizontally with <HR> tag. These features were used to distinguish areas on the web page rather than extracting the main content directly [3], [18], [37]. Considering some tags give a implicit information about the content, recent methods use these features to preprocess the web page.…”
Section: ) Featuresmentioning
confidence: 99%
“…That is why often visual representation is used for block identification. One of the most popular algorithms for this is VIPS [18][19][20]. Akpinar et al took the VIPS algorithm as a base and further improved it by adding HTML5 tags under original tag sets, improved handling of invisible nodes, and ported the algorithm to Java [21].…”
Section: Approaches and Problems For Automated Website Content Block Identificationmentioning
confidence: 99%
“…Inspired by their approach, we integrate our version with Mechanical Turk to enable manaul segmentation at scale via crowdsourcing. 3 Evaluation of Web Page Segmentation. Previous attempts to evaluate web page segmentations fall short in some respects or others.…”
Section: Related Workmentioning
confidence: 99%