Due to the tremendous growth of digital content in World Wide Web (WWW), Text categorization has become an important tool to manage and organize text related data. This paper proposes an Ensemble Learning approach in Improved K Nearest Neighbor algorithm for Text Categorization (EINNTC), which consists of single pass clustering, Ensemble learning and KNN algorithm. The EINNTC method provides solution to traditional KNN classifier issues, by reducing the huge text similarity computation complexity, avoids an impact of noisy training sample, and expediting the process of finding K nearest neighbors. The experiments were carried out with standard benchmark Reuters dataset, and their empirical results shows that the proposed method outperforms the SVM and KNN classifiers.
The number of Web Users accessing the Internet becomes increasing day by day. Any kind of required information can be obtained anytime by anybody from the web. Information retrieval is the fact that there is vast amount of garbage that surrounds any useful information. Such information should be easily accessible and digestible. Internet is no longer monolingual and non-English content is growing rapidly. Speech is easy mode of communication for the people to interact with the computer, rather than using keyboard and mouse. This paper is new attempt to integrate speech recognition and cross language information retrieval system. Speech based queries retrieval performance is compatible with retrieval performance of text based queries. Both Speech recognition and Cross language information retrieval fields are very much challenging to integrate. On Forum for Information Retrieval Evaluation (FIRE) 2011 dataset speech and text query based Tamil-English Cross Language Information Retrieval (CLIR) system achieves 65% and 63% of monolingual retrieval. In future speech based CLIR system very much useful for visually challenged people.
Speech recognition is one of the fascinating fields in the area of Computer science. Accuracy of speech recognition system may reduce due to the presence of noise present in speech signal. Therefore noise removal is an essential step in Automatic Speech Recognition (ASR) system and this paper proposes a new technique called combined thresholding for noise removal. Feature extraction is process of converting acoustic signal into most valuable set of parameters. This paper also concentrates on improving Mel Frequency Cepstral Coefficients (MFCC) features by introducing Discrete Wavelet Packet Transform (DWPT) in the place of Discrete Fourier Transformation (DFT) block to provide an efficient signal analysis. The feature vector is varied in size, for choosing the correct length of feature vector Self Organizing Map (SOM) is used. As a single classifier does not provide enough accuracy, so this research proposes an Ensemble Support Vector Machine (ESVM) classifier where the fixed length feature vector from SOM is given as input, termed as ESVM_SOM. The experimental results showed that the proposed methods provide better results than the existing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.