This paper proposed a text categorization comparison between simple BPNN and Combinatorial method of LSI and BPNN. In the traditional error back propagation network, the weight adjustment process gets block in a local minima and also the training speed of such network is very slow which leads to reduced performance and reduced efficiency of the network. Also the Learning time of overall network is very high.Hence, to improve the categorization accuracy, a new combinatorial method of LSI (latent semantic Indexing) and BPNN (back propagation neural network) is proposed. The latent semantics demonstration is an accurate data structure in low-dimensional space in which documents, terms and queries are rooted and also compared. Singular value decomposition (SVD) technique is used in Latent semantic Analysis in which large term-document matrix is decomposed into a set of k orthogonal factors by which the original textual data is changed to a smaller semantic space. New document vectors are found in reduced k-dimensional space. Also new coordinates of the queries are found. Here we implement combinatorial method of LSI and BPNN based technique for the classification of 20Newsgroup dataset which include categories of Sports, CS, and Medicine. The proposed technique implemented is compared with the existing BPNN technique. . Hence, this new method greatly reduces the dimension and better classification results can be achieved.
With the rapid growth of Internet, E-mail, with its convenient and efficient characteristics, has become an important means of communication in people"s life. It reduces the cost of communication. It comes with Spam. Spam emails, also known as "junk e-mails", are unsolicited one"s sent in bulk with hidden or forged identity of the sender, address, and header information. It is vital to pursue more effective spam filtering approaches to maintain normal operations of e-mail systems and to protect the interests of email users. In this paper we developed a Spam filter based on Bayesian filtering method using Aho-corasick and PFAC string matching algorithm. This filter developed an improved version of spam filter based on traditional Bayesian spam filtering to improve spam filtering efficiency, and to reduce chances of misjudgement of malignant spam. For further improvement of Spam filtering process we are transform the filter in to parallel spam filter on GPGPU's by using PFAC Algorithm.
Based on an effective clustering algorithm Seeds affinity propagation-in this paper an efficient clustering approach is presented which uses one dimension for the group of the words representing the similar area of interest with that we have also considered the uneven weighting of each dimension depending upon the categorical bias during clustering. After creating the vector the clustering is performed using seedsaffinity clustering technique. Finally to study the performance of the presented algorithm, it is applied to the benchmark data set Reuters-21578 and compared it for F-measure, with kmeans algorithm and the original AP (affinity propagation) algorithm the results shows that the presented algorithm outperforms the others by acceptable margin.
Text Classification is one of the booming area in research with the availability of huge amount of electronic data in the form of news article, research articles, email message, blog, web pages etc. Text Representation is a vital step for text classification. In text representation, term weighting method assigns appropriate weights to the term to get better performance; the term weighting method which uses known information on membership of training document is supervised Term weighting method. Unsupervised term weighting method tf is compared with supervised Term weighting method tf.rf with Back Propagation Neural Network, results of experiment demonstrates that term weighing method (tf.rf) performs better than (tf) term frequency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.