Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004
DOI: 10.1145/1008992.1009035
|View full text |Cite
|
Sign up to set email alerts
|

Web-page classification through summarization

Abstract: Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Webpage classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
50
0
2

Year Published

2005
2005
2015
2015

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 131 publications
(52 citation statements)
references
References 28 publications
0
50
0
2
Order By: Relevance
“…It is pointed out that document summarization will improve web page classification [5]. It has some relationship with our work since we both consider the selection of "important" words.…”
Section: Related Workmentioning
confidence: 95%
See 2 more Smart Citations
“…It is pointed out that document summarization will improve web page classification [5]. It has some relationship with our work since we both consider the selection of "important" words.…”
Section: Related Workmentioning
confidence: 95%
“…Traditionally, image search [1] and clustering [3] used image content to analyze its semantics. Some systems [4,5] based on CBIR were designed and implemented. There exists the problem that it is very hard to learn the semantic meaning of an image from low level visual features [2].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In this with the help of HTML information seen in web page they are classifying the webpage with ANN classification technique. Dou Shen [4], Zheng Chen [4], Qiang Yang [4], Hua-JunZeng [4], Benyu Zhang [4], Yuchang Lu [4], Wei-Ying Ma [4] proposed mechanism of summarization which is used to categorize the web page. With the help of Web summarization algorithm, they have carried out the analysis of page-layout for the draw out of main topic of web page to increase the classification accuracy.…”
Section: Literature Surveymentioning
confidence: 99%
“…The methodology puts together four separate known techniques to solve an important unaddressed area of Web search. Relevant existing techniques include Web page classification algorithms [Brin and Page 1998, Chakrabarti 2002, Dumais and Chen 2002, Shen et al 2004, Shih and Karger 2004, Sun et al 2000 and machine learning algorithms [Nigam et al 2000, Raskutti et al 2002. The ultimate objective of our methodology is to minimize the need for people to resort to tedious manual navigations of semantically related Web pages in similar domains.…”
Section: Introductionmentioning
confidence: 99%