Inoshika Dilrukshi scite author profile

Abstract-This paper presence a new feature selection method which can be used for creating data set in order to classify Twitter short messages. The Twitter short messages contain only 140 characters. Thus, the number of words per sentence is almost equal for all sentences. Once you pool the all text messages together, there can be number of words in the pool but, for a given sentence, there will be only few words included from the pool. This causes to have a sparse matrix as the feature vector. By removing the unrelated words from the feature space, the dimension can be reduced and therefore, the sparseness can be reduced. The unrelated words can be define as the common words (high frequent words) and noise words (low frequent words). Even though by removing these unrelated words, still it may contain some unrelated words. Thus, a feature selection technique was needed to apply in order to select the best feature set. The suggested new feature selection method was based on the Information Theory. It was named as Ratio Method. The calculated value increase when the word occurs frequently in a particular group and it decrease when the word occur in all groups. The best features can be choose by using a proper threshold. Some popular text classifiers such as SVM, Naïve Bayes and Decision Trees are used to evaluate the performance of the new feature selection method and to compare the new method with existing methods.

show abstract

Automated response recognition system for questionnaires

Dilrukshi

Chandrasekara

2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Inoshika Dilrukshi

Twitter news classification using SVM

Twitter news classification: Theoretical and practical comparison of SVM against Naive Bayes algorithms

A Feature Selection Method for Twitter News Classification

Automated response recognition system for questionnaires

Contact Info

Product

Resources

About