Evaluating the performance of sentence level features and domain sensitive features of product reviews on supervised sentiment analysis tasks

Rintyarna, Bagus Setya; Sarno, Riyanarto; Fatichah, Chastine

doi:10.1186/s40537-019-0246-8

Cited by 25 publications

(14 citation statements)

References 31 publications

(56 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These recommendations critically implied that any work that combines Word2Vec representations with lexicon labeling of words would improve feature extraction for sentiment analysis. Such a recommendation is also supported by Bagus et al 21 that semantic labeling of words has the potential of improving supervised sentiment classification since bag of words doesn't consider semantic of words.…”

Section: Related Workmentioning

confidence: 94%

Lexicon‐pointed hybrid N‐gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis

Mutinda

Mwangi

Okeyo

2021

Engineering Reports

View full text Add to dashboard Cite

Sentiment analysis of social media textual posts can provide information and knowledge that is applicable in social settings, business intelligence, evaluation of citizens' opinions in governance, and in mood triggered devices in the Internet of Things. Feature extraction and selection is a key determinant of accuracy and computational cost of machine learning models for such analysis. Most feature extraction and selection techniques utilize bag of words, N‐grams, and frequency‐based algorithms especially Term Frequency‐Inverse Document Frequency. However, these approaches do not consider relationships between words, they ignore words' characteristics and they suffer high feature dimensionality. In this paper we propose and evaluate a feature extraction and selection approach that utilizes a fixed hybrid N‐gram window for feature extraction and minimum redundancy maximum relevance feature selection algorithm for sentence level sentiment analysis. The approach improves the existing features extraction techniques, specifically the N‐gram by generating a hybrid vector from words, Part of Speech (POS) tags, and word semantic orientation. The vector is extracted by using a static trigram window identified by a lexicon where a sentiment word appears in a sentence. A blend of the words, POS tags, and the sentiment orientations of the static trigram are used to build the feature vector. The optimal features from the vector are then selected using minimum redundancy maximum relevance (MRMR) algorithm. Experiments were carried out using the public Yelp dataset to compare the performance of the proposed model and existing feature extraction models (BOW, normal N‐grams and lexicon‐based bag of words semantic orientations). Using supervised machine learning classifiers the experimental results showed that the proposed model had the highest F‐measure (88.64%) compared to the highest (83.55%) from baseline approaches. Wilcoxon test carried out ascertained that the proposed approach performed significantly better than the baseline approaches. Comparative performance analysis with other datasets further affirmed that the proposed approach is generalizable.

show abstract

Section: Related Workmentioning

confidence: 94%

Lexicon‐pointed hybrid N‐gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis

Mutinda

Mwangi

Okeyo

2021

Engineering Reports

View full text Add to dashboard Cite

show abstract

“…What was lost, or at least left outside of the article, was more detailed information on the nature and common denominators of positive or negative feedback. This can be alleviated (to a degree) by using aspect-based sentiment analysis to connect the sentiment to a particular aspect [43,63] or by using the Apriori algorithm to establish association rules between sentiments and different issues [65].…”

Section: Tool Induced Lack Of Depthmentioning

confidence: 99%

“…A word might have different sentiment values depending on the sentence and/or context it occurs, but some approaches do not consider the order of words. [48,63,64]. Accuracy can be increased by joint analysis of local (word's syntactic features) and global (document, paragraph) contexts [58,63].…”

Section: Noisy Datamentioning

confidence: 99%

See 1 more Smart Citation

Qualitative Big Data’s Challenges and Solutions: An Organizing Review

Suvivuo¹

2021

Proceedings of the Annual Hawaii International Conference on System Sciences

View full text Add to dashboard Cite

Digitalization of everyday lives has tremendously increased the amount of digital (trace) data of people's behaviour available for researchers. However, traditional qualitative research methods struggle with the width and breadth of the data. This paper reviewed 61 recent studies that had utilized qualitative big data for the practical challenges they had encountered and how they were addressed. While quantitative and qualitative big data share many common issues, the review points at that lack of qualitative methods and dataset reduction required by algorithms in big data research decreases the richness of the qualitative data. Locating relevant data and reducing noise are further challenges. Currently, these challenges can be only partially addressed with a combination of human and computer pattern recognition and crowdsourcing. The review describes many "tricks of the trade" but abduction research and pragmatist philosophy seem promising starting places for a more pervasive framework.

show abstract

“…For each case, precision, recall and F-measures were calculated as performance metrics. From the comparative analysis, SLF + DSF yield the better performance of 82.5% precision, 85.4% recall and 83.1% f-measure (4) .…”

Section: Introductionmentioning

confidence: 97%

A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance

Sujatha¹,

Radha²

2021

IJST

View full text Add to dashboard Cite

Objectives:Toextract and identify the subjective information of social media user from the unstructured data. To overcome the high dimensionality and sparsity those are the two major challenges in sentiment analysis of text datasets. To increase the model performance by using possibly minimum feature sets in a text classification problem. Methods: We proposed a new filtration method which is applied for the removal of correlated features and zero importance features in addition to the various feature selection methods. The various feature selections such as Mutual Info, Lasso, Recursive Feature Elimination and dimensionality reduction, Principal Component Analysis (PCA) have been used along with the proposed filtration to find the compelling features. This approach was evaluated using three Indian Government Schemes and these tweets were classified using Random Forest classifier. The performance was evaluated using various metrics such as accuracy, precision, recall, f1_score, log loss and roc-auc. Findings: In this research, we proposed a model for selecting relevant and non-correlated feature subsets from the unstructured dataset. From this model, accuracy of 92% with the minimum log loss 0.22 was achieved through the minimum number of feature set. Improvements: This study proves that the performance of the model will be improved by overcoming those two problems (dimensionality and sparsity). Here various feature selection methods have been applied with the proposed filtration in order to minimize the number of features. The computing time and the model performance will be improved as a result of decreasing the features. And this will be more effective in case of large datasets. Even though Random Forest performs well in high dimensional datasets we need some more optimization.

show abstract

Evaluating the performance of sentence level features and domain sensitive features of product reviews on supervised sentiment analysis tasks

Cited by 25 publications

References 31 publications

Lexicon‐pointed hybrid N‐gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis

Lexicon‐pointed hybrid N‐gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis

Qualitative Big Data’s Challenges and Solutions: An Organizing Review

A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance

Contact Info

Product

Resources

About