Proceedings of the 9th ACM Symposium on Document Engineering 2009
DOI: 10.1145/1600193.1600241
|View full text |Cite
|
Sign up to set email alerts
|

Web document text and images extraction using DOM analysis and natural language processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(18 citation statements)
references
References 8 publications
0
18
0
Order By: Relevance
“…(1) for each line ∈ lines do (2) if line match heuristic rules then (3) do operation (4) end if (5) end for (6) for each line ∈ lines do (7) find pattern of line (8) match the pattern to others (9) if match then (10) record the block (11) else (12) continue (13) …”
Section: Resume Facts Identificationmentioning
confidence: 99%
See 1 more Smart Citation
“…(1) for each line ∈ lines do (2) if line match heuristic rules then (3) do operation (4) end if (5) end for (6) for each line ∈ lines do (7) find pattern of line (8) match the pattern to others (9) if match then (10) record the block (11) else (12) continue (13) …”
Section: Resume Facts Identificationmentioning
confidence: 99%
“…Since the details of resume are hard to extract, it is an alternative way to achieve the goal of job matching with keywords search approach [3,5]. Inspired by the way of extracting the news web page [6][7][8][9][10], several rule-based extraction approaches [11][12][13] treat the resume text as a web page and then extract detailed facts based on the DOM tree structure. For the last kind of methods, researchers treat the resume extracting task as a semantic-based entity extraction problem.…”
Section: Introductionmentioning
confidence: 99%
“…In these approaches, visual features, such as width, height, or position, are used to recognize and analyze information blocks. The combination of natural language processing and DOM analysis [24] or statistical analysis [23] was proposed to recognize the main content in a Web page. The TWWF approach [60] used the DOM structure to divide a Web page.…”
Section: Related Workmentioning
confidence: 99%
“…The topic of our paper centers on the textual information that surrounds or is attached to Web images. This contextual information is a unique feature of the Web images and has long been mined for various uses such as image annotation (Joshi and Liu, 2009;Leong at al., 2010;, clustering of image search results (Blaschko and Lampert, 2008;Cai et al, 2004;Gao et al, 2005;Rege et al, 2008;Wang et al, 2005), inference of image semantic content (Feng and Lapata, 2008;Ghoshal et al, 2005;Tang et al, 2009) etc.…”
Section: Issues In Contextual Information-based Image Understanding Omentioning
confidence: 99%