2006
DOI: 10.1007/11788911_1
|View full text |Cite
|
Sign up to set email alerts
|

The Lixto Project: Exploring New Frontiers of Web Data Extraction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2006
2006
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…Some approaches employed ML techniques like HMMs and classifiers, but the general rationale has been to arrive at a set of good extraction rules that could be applied in a deterministic manner. Early prototype systems of this kind included Rapier [28] and the W4F toolkit [112]; more recent systems that are being pursued further include Lixto [14,30,70], RoadRunner [43,44], and SEAL [135].…”
Section: Rule/query-based Methodsmentioning
confidence: 99%
“…Some approaches employed ML techniques like HMMs and classifiers, but the general rationale has been to arrive at a set of good extraction rules that could be applied in a deterministic manner. Early prototype systems of this kind included Rapier [28] and the W4F toolkit [112]; more recent systems that are being pursued further include Lixto [14,30,70], RoadRunner [43,44], and SEAL [135].…”
Section: Rule/query-based Methodsmentioning
confidence: 99%
“…That work is described in [6]- [8]. Currently, the scope of the Lixto project has been extended to additional research in the field of fully automated and unsupervised Web data extraction [9]. The supervised work of the Lixto project focuses on data extraction from deep Web pages which includes the challenge of form filling and the navigation on Web pages.…”
Section: Scientific Approachesmentioning
confidence: 99%
“…21 use document understanding techniques for identifying atomic elements of PDF documents on which apply spatial reasoning and wrapping techniques that enable to identify significant document blocks; Gottlob et Al. in 22 describe the PDF document preprocessing techniques currently used in the LixTo system. Before the Gottlob's papers only document understanding 23 and table recognition methods 24 was applied on PDF documents in attempting to identify and extract relevant information from them.…”
Section: Related Workmentioning
confidence: 99%