Although documents have hundreds of thousands of unique words, only a small number of words are significantly useful for text analysis. Thus, feature selection has become an important issue to be addressed in various text analysis studies. A number of techniques and algorithms for feature selection are available, but unfortunately, it is hard to say that a certain algorithm overcomes the others, because feature selection results mostly depend on the source documents. We should pick and choose the appropriate algorithm and the best subset of feature words whenever we need to analyze source documents. In this paper, we present a framework named ‘PicAChoo’, which stands for ‘Pick And Choose’ that enables customizable feature selection environments by composing several primitive feature selection methods without hard-coding. As indicated in the name, this framework provides many strategies for extracting appropriate features and allows dynamic compositions among several feature selection methods. In addition, it tries to give users an environment that utilizes linguistic characteristics of textual data, namely part-of-speech, sentence structures, and so on. Finally, we illustrate that selected feature words can be used for various intelligent services.
Abstract-As the number of transactions in E-market places is growing, more and more product information and product reviews are posted on the Internet. Because customers want to purchase good products, product reviews became most important information. But, because of the massive volume of reviews, customers can't read all reviews. In order to solve this problem, a lot of research is being carried out in Opinion Mining. Through the Opinion Mining, we can know about contents of whole product reviews. Traditionally research on Natural Language Processing was applied to the Opinion Mining area in early stage. Recently, the computational statistics are applied to handle massive volume of reviews. In this research, we suggest a method for summarization of product reviews using the user's opinion, feature occurrences, and the rate of review in order to improve the performance of existing methods. With this method, we can handle massive volumes of reviews in a short time efficiently. We guarantee the correctness of the review summary by finding out the semantic meaning of reviews. Besides, we show these advantages through some experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.