2007 International Symposium on Information Technology Convergence (ISITC 2007) 2007
DOI: 10.1109/isitc.2007.6
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Informative Web Page Blocks for Efficient Information Extraction Using Visual Block Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 21 publications
0
14
0
Order By: Relevance
“…Typical information extraction tasks focus on data regions and data records. That implies that as the complexity of typical web documents increases, information extractors have to analyze more and more irrelevant regions, which has an impact on both efficiency and effectiveness [84], [163], [175]. This has motivated a number of authors to work on region extractors as a means to relieve information extractors from the burden of analyzing many regions of a web document that do not contain any relevant information [19], [23], [24], [53], [84], [97], [100], [114], [125], [141], [163], [169], [179], [180].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Typical information extraction tasks focus on data regions and data records. That implies that as the complexity of typical web documents increases, information extractors have to analyze more and more irrelevant regions, which has an impact on both efficiency and effectiveness [84], [163], [175]. This has motivated a number of authors to work on region extractors as a means to relieve information extractors from the burden of analyzing many regions of a web document that do not contain any relevant information [19], [23], [24], [53], [84], [97], [100], [114], [125], [141], [163], [169], [179], [180].…”
Section: Introductionmentioning
confidence: 99%
“…That implies that as the complexity of typical web documents increases, information extractors have to analyze more and more irrelevant regions, which has an impact on both efficiency and effectiveness [84], [163], [175]. This has motivated a number of authors to work on region extractors as a means to relieve information extractors from the burden of analyzing many regions of a web document that do not contain any relevant information [19], [23], [24], [53], [84], [97], [100], [114], [125], [141], [163], [169], [179], [180]. The difference between information extractors and region extractors is that the former focus on extracting and structuring data records and their attributes, whereas the latter focus on identifying the HTML fragments that contain this information.…”
Section: Introductionmentioning
confidence: 99%
“…Further, ND leads to website summarization. Jinbeom Kang et al [10] proposed the RIPB(Recognizing Informative Page Blocks) algorithm that detects the informative blocks in a Web page by exploiting the visual block segmentation scheme. RIPB uses the visual page segmentation algorithm to analyze and partition a Web page into a set of logical blocks, and then groups related blocks with similar structures into a block cluster and recognizes the informative block clusters by applying some heuristic rules to the cluster information.…”
Section: Related Workmentioning
confidence: 99%
“…The blocks making up the page are divided by the HTML tags (i.e. <table><div> [7] ). In this paper, all the blocks we talk about are blocks divided by the HTML Tags.…”
Section: Block-level Linksmentioning
confidence: 99%