2019
DOI: 10.1007/978-3-030-14799-0_7
|View full text |Cite
|
Sign up to set email alerts
|

Robust Web Data Extraction Based on Unsupervised Visual Validation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…The idea here is how different types of content are formatted 'relative to each other' on a product listing web page can be consistent across many different domains and websites. This has been validated in different contexts such as [72]. The abundance of already annotated product listing pages in the form of semantic markup data can create an opportunity to train such taggers in a self-supervised way.…”
Section: Discussionmentioning
confidence: 99%
“…The idea here is how different types of content are formatted 'relative to each other' on a product listing web page can be consistent across many different domains and websites. This has been validated in different contexts such as [72]. The abundance of already annotated product listing pages in the form of semantic markup data can create an opportunity to train such taggers in a self-supervised way.…”
Section: Discussionmentioning
confidence: 99%
“…The idea here is how different types of content are formatted 'relative to each other' on a product listing web page can be consistent across many different domains and websites. This has been validated in different contexts such as Potvin and Villemaire (2019). The abundance of already annotated product listing pages in the form of semantic markup data can create an opportunity to train such taggers in a self-supervised way.…”
Section: Discussionmentioning
confidence: 99%
“…To overcome the challenges associated with web data extraction, an extractor is presented based on supervised and unsupervised methods to extract the specific product description from the websites [39]. An unsupervised technique called visual validation was combined with a supervised classifier to produce a versatile extractor that works on a variety of websites.…”
Section: Related Workmentioning
confidence: 99%