In order to save time in extracting specific information from high volume of data in web documents, this paper proposes an architectural model of generic web document classification system using design patterns for classifying web documents. This work implements two classification techniques for classifying Thai web documents, namely centroid classification and neural network classification, based on the proposed model and compares their classification effectiveness empirically. The training data sets in this experiment consist of 500 web documents of the following five categories (100 documents for each category): mobile phone sales, book sales, travel sales, education information and company profile. Another two hundred and fifty web documents were then used to test the two classifiers. The experiment results showed that the centroid classifier outperforms the neural network classifier both in term of efficiency and effectiveness
In order to reduce time to find specific information from high volume of information on the Web, this paper proposes the implementation of an automatic identification of specific Web documents by using centroid technique. The Initial training sets in this experiment are 4113 Thai e-Commerce Web documents. After training process, the system gets a Centroid e-Commerce vector. In order to evaluate the system, six test sets were taken under consideration. In each test set has 100 Web pages both known e-Commerce and non e-Commerce Web pages. The average system performance is about 90 %.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.