2017
DOI: 10.1016/j.ipm.2017.04.007
|View full text |Cite
|
Sign up to set email alerts
|

DERIN: A data extraction method based on rendering information and n-gram

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…In the literature, there are many proposals to extract data from HTML documents in general, not specifically tables (Ferrara, de Meo, Fiumara, & Baumgartner, 2014;Sleiman & Corchuelo, 2013a). They rely on text alignment (Sleiman & Corchuelo, 2013b), neural networks (Sleiman & Corchuelo, 2014), learning first-order rules (Jiménez & Corchuelo, 2016a), inferring propositiorelational rules (Jiménez & Corchuelo, 2016b), learning decision trees (Uzun, Agun, & Yerlikaya, 2013), embedding graphs (Jiménez, Roldán, Gallego, & Corchuelo, 2020), or using n-grams and rendering information (Figueiredo, Assis, & Ferreira, 2017), to mention a few. Unfortunately, they do not seem to be appropriate to extract the underlying relationships between the cells in HTML tables (Cafarella et al, 2018), which motivated much work on table-understanding (Roldán et al, 2020;Zhang & Balog, 2020).…”
Section: Context and Motivationmentioning
confidence: 99%
“…In the literature, there are many proposals to extract data from HTML documents in general, not specifically tables (Ferrara, de Meo, Fiumara, & Baumgartner, 2014;Sleiman & Corchuelo, 2013a). They rely on text alignment (Sleiman & Corchuelo, 2013b), neural networks (Sleiman & Corchuelo, 2014), learning first-order rules (Jiménez & Corchuelo, 2016a), inferring propositiorelational rules (Jiménez & Corchuelo, 2016b), learning decision trees (Uzun, Agun, & Yerlikaya, 2013), embedding graphs (Jiménez, Roldán, Gallego, & Corchuelo, 2020), or using n-grams and rendering information (Figueiredo, Assis, & Ferreira, 2017), to mention a few. Unfortunately, they do not seem to be appropriate to extract the underlying relationships between the cells in HTML tables (Cafarella et al, 2018), which motivated much work on table-understanding (Roldán et al, 2020;Zhang & Balog, 2020).…”
Section: Context and Motivationmentioning
confidence: 99%
“…Bu is the backlink start pointed towards node u, and N v is the number of links of each node v. One node v divides its own ranking by N v and delivers to page u, which is connected through the links. Nodes with backlinks from important nodes (high ranking) are ranked high [35].…”
Section: = (mentioning
confidence: 99%
“…Data mining principles can be independent of a particular domain for knowledge extraction [11] since their methods are able to learn how to extract the data, perform a given analysis domain independently and detect different record structures and their attributes based on rendering information [18]. It is increased the importance of understanding correlations between data, and data mining methods are interesting to find some patterns and association rules for various analyses and decision aids such as product category recommendations and determination of possible behavioral changes [31].…”
Section: Data Mining and Meteorologymentioning
confidence: 99%