2021
DOI: 10.48550/arxiv.2101.09465
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Abstract: Web search is an essential way for human to obtain information, but it's still a great challenge for machines to understand the contents of web pages. In this paper, we introduce the task of web-based structural reading comprehension. Given a web page and a question about it, the task is to find an answer from the web page. This task requires a system not only to understand the semantics of texts but also the structure of the web page. Moreover, we proposed WebSRC, a novel Web-based Structural Reading Comprehe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(15 citation statements)
references
References 47 publications
0
15
0
Order By: Relevance
“…Recent advances in PLMs have greatly improved the performance of QA models on benchmark datasets with only plain text passages. Recently, Chen et al [2] introduced the WebSRC dataset, where the answer resides in a given webpage. The dataset poses new challenges for the model to capture the structural information in the webpage, as the text extracted from the raw HTML documents are just short phrases without context.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Recent advances in PLMs have greatly improved the performance of QA models on benchmark datasets with only plain text passages. Recently, Chen et al [2] introduced the WebSRC dataset, where the answer resides in a given webpage. The dataset poses new challenges for the model to capture the structural information in the webpage, as the text extracted from the raw HTML documents are just short phrases without context.…”
Section: Related Workmentioning
confidence: 99%
“…The dataset poses new challenges for the model to capture the structural information in the webpage, as the text extracted from the raw HTML documents are just short phrases without context. Chen et al [2] directly used PLMs pretrained for unstructured text to encode the tags and tokens from the HTML document, and relied on visual representations learned from the rendered webpage to capture the structural information.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations