Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016
DOI: 10.1145/2939672.2939858
|View full text |Cite
|
Sign up to set email alerts
|

Lossless Separation of Web Pages into Layout Code and Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 36 publications
0
8
0
Order By: Relevance
“…Researches on data extraction from the Deep web have been conducted by [13][14][15][16][17][18][19][20]. They are differentiated based on the number of web page inputs.…”
Section: Literature Reviewmentioning
confidence: 99%
See 3 more Smart Citations
“…Researches on data extraction from the Deep web have been conducted by [13][14][15][16][17][18][19][20]. They are differentiated based on the number of web page inputs.…”
Section: Literature Reviewmentioning
confidence: 99%
“…They are differentiated based on the number of web page inputs. Researches on data extraction using one web page input were conducted by [13][14][15], in general they used a repeating structure of HTML tags, such as tables (<table>, <tr>, <th>, and <td>) and list (<ul > and <li>). For example, consider a conference schedule in Fig.…”
Section: Literature Reviewmentioning
confidence: 99%
See 2 more Smart Citations
“…Relying on the HTML DOM tree structure makes it difficult to train a machine learning based model for publication extraction because: (i) Text in a publication string may be separated in many different DOM tree nodes. (ii) The DOM tree structure, which previous web data record extraction systems (Liu et al, 2003;Furche et al, 2014;Omari et al, 2016) rely on, may vary given the same webpage content.…”
Section: Related Workmentioning
confidence: 99%