2021
DOI: 10.48550/arxiv.2101.02415
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simplified DOM Trees for Transferable Attribute Extraction from the Web

Yichao Zhou,
Ying Sheng,
Nguyen Vo
et al.

Abstract: There has been a steady need to precisely extract structured knowledge from the web (i.e. HTML documents). Given a web page, extracting a structured object along with various attributes of interest (e.g. price, publisher, author, and genre for a book) can facilitate a variety of downstream applications such as large-scale knowledge base construction, e-commerce product search, and personalized recommendation. Considering each web page is rendered from an HTML DOM tree, existing approaches formulate the problem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(16 citation statements)
references
References 30 publications
0
16
0
Order By: Relevance
“…We use their code released on GitHub 2 and experiment with our settings. Also note that for all models, we do not apply extra post-processing such as site-level voting in original SimpDOM [29].…”
Section: Attribute Extractionmentioning
confidence: 99%
See 4 more Smart Citations
“…We use their code released on GitHub 2 and experiment with our settings. Also note that for all models, we do not apply extra post-processing such as site-level voting in original SimpDOM [29].…”
Section: Attribute Extractionmentioning
confidence: 99%
“…Unlike the page-level F1 scores used in previous work [17,29] that scores the extraction on a page as success as long as one of the predictions is correct, attribute value-level F1 considers the case where there are multiple values for the same attribute in a webpage, and penalizes the false positive predictions made by the model 3 .…”
Section: Attribute Extractionmentioning
confidence: 99%
See 3 more Smart Citations