2021
DOI: 10.14778/3476249.3476293
|View full text |Cite
|
Sign up to set email alerts
|

The smallest extraction problem

Abstract: We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 57 publications
0
0
0
Order By: Relevance