2010
DOI: 10.1142/s0218194010004657
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient Web-Based Wrapper and Annotator for Tabular Data

Abstract: In the last few years, several works in the literature have addressed the problem of data extraction from web pages. The importance of this problem derives from the fact that, once extracted, data can be handled in a way similar to instances of a traditional database, which in turn can facilitate application of web data integration and various other domain specific problems. In this paper, we propose a novel table extraction technique that works on web pages generated dynamically from a back-end database. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…. , A k is the projection list, µ is the schema matcher (e.g., PruSM [57]), ω is the wrapper (e.g., FastWrap [7]), φ is the form filler (e.g., iForm [70]), and ϕ is the web form address or the form function. Note that this statement returns a table by submitting columns from each row in r to the deep web database at ϕ.…”
Section: Accessing Online Tools and Databasesmentioning
confidence: 99%
“…. , A k is the projection list, µ is the schema matcher (e.g., PruSM [57]), ω is the wrapper (e.g., FastWrap [7]), φ is the form filler (e.g., iForm [70]), and ϕ is the web form address or the form function. Note that this statement returns a table by submitting columns from each row in r to the deep web database at ϕ.…”
Section: Accessing Online Tools and Databasesmentioning
confidence: 99%
“…In order to minimize the effort of preparing a training set, Dalvi et al recently proposed a generic framework for supervised wrapper induction based on automatically obtained noisy training data [17]. Without requiring a training set, Amin and Jamil [4] extracted from a symbol list of an HTML page a commonly occurring pattern of the highest length and super-maximal repeats. The pattern is converted to a regular expression, which is subsequently used to extract the record level data items.…”
Section: Related Workmentioning
confidence: 99%
“…In this statement, OntoMatch [51] and FastWrap [52] respectively are schema matcher and wrapper generator available in the BioFlow toolbox. EXTEND Statement.…”
Section: Query Translationmentioning
confidence: 99%