2017
DOI: 10.1017/s1351324917000298
|View full text |Cite
|
Sign up to set email alerts
|

To use or not to use: Feature selection for sentiment analysis of highly imbalanced data

Abstract: We investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 23 publications
0
9
0
Order By: Relevance
“…We also added three emoji sentiment features, which consist of the positive, negative, and overall sentiment scores based on the Emoji Sentiment Ranking (Novak et al, 2015). We performed feature selection for the n-gram features using a filtering approach with information gain, which has proven to be effective in social media sentiment classification (Kübler et al, 2018).…”
Section: Model Detailsmentioning
confidence: 99%
“…We also added three emoji sentiment features, which consist of the positive, negative, and overall sentiment scores based on the Emoji Sentiment Ranking (Novak et al, 2015). We performed feature selection for the n-gram features using a filtering approach with information gain, which has proven to be effective in social media sentiment classification (Kübler et al, 2018).…”
Section: Model Detailsmentioning
confidence: 99%
“…Supervised learning uses labeled data to build a classification model, which is subsequently used to predict class labels for (unlabeled) test data. Supervised learning techniques have extensively been used for sentiment analysis [7], [10], [27]- [30]. The limitation of such techniques, however, is the requirement of labeled data.…”
Section: Have Explored Twitter Datamentioning
confidence: 99%
“…Secondly, it aims to filter out the noise and the less relevant features to avoid overfitting. According to [30], feature selection could be mainly categorized into the filter method and the wrapper method. The filter method would generally evaluate the features by assigning them a ranking score based on the distributional statistics in the data.…”
Section: Feature Selectionmentioning
confidence: 99%
“…The wrapper method on the other hand, would identify the optimal subset of the features using held-out data. However, since the number of subsets is exponential, the wrapper method is tremendously inefficient when a large feature set is involved, even with greedy algorithms [30]. Besides that, [31] pointed out that the filter method is generally faster compared to the wrapper method.…”
Section: Feature Selectionmentioning
confidence: 99%