2012 International Conference on Machine Learning and Cybernetics 2012
DOI: 10.1109/icmlc.2012.6359546
|View full text |Cite
|
Sign up to set email alerts
|

Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites

Abstract: Abstract:Web Content Extraction technique is proposed in this paper. The technique is able to work with both single and multiple pages based on heuristic rules. An Extracted Content Matching (ECM) technique is proposed in the multiple page extraction to identify the noises among the extracted results. Some features in this technique are also introduced in order to reduce processing time such as use of XPath, file compression, and parallel processing. Assessment of the performance is based on precision, recall … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 6 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?