Proceedings of the 8th Conference of the European Society for Fuzzy Logic and Technology 2013
DOI: 10.2991/eusflat.2013.27
|View full text |Cite
|
Sign up to set email alerts
|

Sample-based XPath Ranking for Web Information Extraction

Abstract: Web information extraction typically relies on a wrapper, i.e., program code or a configuration that specifies how to extract some information from web pages at a specific website. Manually creating and maintaining wrappers is a cumbersome and error-prone task. It may even be prohibitive as some applications require information extraction from previously unseen websites. This paper targets automatic on-the-fly wrapper creation for websites that provide attribute data for objects in a 'search -search result pag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…A probabilistic approach may lead to more robust navigation. Subtasks like finding search results (Trieschnigg et al 2012) or finding target fields (Jundt and van Keulen 2013) are typically based on ranking "possible actions". By executing not only one but a top-k of possible actions and representing resulting data probabilistically, consequences of imperfect ranking are reduced.…”
Section: Example Applicationsmentioning
confidence: 99%
“…A probabilistic approach may lead to more robust navigation. Subtasks like finding search results (Trieschnigg et al 2012) or finding target fields (Jundt and van Keulen 2013) are typically based on ranking "possible actions". By executing not only one but a top-k of possible actions and representing resulting data probabilistically, consequences of imperfect ranking are reduced.…”
Section: Example Applicationsmentioning
confidence: 99%
“…Alternative model mapping scrapelets could be introduced to support also websites that make less use of HTML attributes. Jundt and Van Keulen for example used a manually annotated training set to detect the respective XPaths in [51]. We used NeoGeo for a POI scraping task of 603 POI categories for the city of Enschede.…”
Section: Poi Scrapingmentioning
confidence: 99%