Feature selection for ordinal regression

Baccianella, Stefano; Esuli, Andrea; Sebastiani, Fabrizio

doi:10.1145/1774088.1774461

Cited by 108 publications

(144 citation statements)

References 9 publications

Supporting

Mentioning

140

Contrasting

Unclassified

Order By: Relevance

“…For instance, in the training set of TripAdvisor-15763, the smaller of the two datasets discussed in Section 4, there are 38,447 unique words and 171,894 unique LM-features; using them all would degrade accuracy (due to overfitting) and efficiency (at both training time and classification time). For Phase 2 StarTrack relies on RR(N C * IDF ), a feature selection technique for ordinal regression that we have proposed in 5) , and that in previous experimentation has given consistently good results. * 11 RR(N C * IDF ) attributes a score to each feature, after which only the highest-scoring features are retained.…”

Section: The Internals Of Startrack: Learning and Feature Selectionmentioning

confidence: 99%

“…For this purpose, we have defined a module (based on part-of-speech tagging and a simple grammar of phrases -see 4) for details) that (a) extracts complex phrases, such as hotel(NN) was(Be) very(RB) nice(JJ) * 11 The name "RR(N C * IDF )" stands for round robin on negative correlation times inverse document frequency, and refers to the fact that the technique consists in computing, for each feature, a score resulting from its inverse document frequency and its negative correlation with a given rating, and then choosing the features according to a policy that "round-robins" across the ratings. The interested reader can check 5) for details.…”

Section: The Internals Of Startrack: Sentiment-based Feature Extractionmentioning

confidence: 99%

“…StarTrack has its roots in disciplines such as information retrieval, machine learning, and computational linguistics; it is outside the scope of this paper to describe its underlying model in detail, and we leave it to the mathematically conscious reader to check 4,5) for details. In this paper, only a high-level description will be given that mostly tries to appeal to intuition.…”

Section: §1 Introductionmentioning

confidence: 99%

See 2 more Smart Citations

StarTrack: The Next Generation (of Product Review Management Tools)

2013

Self Cite

View full text Add to dashboard Cite

Online product reviews are increasingly being recognized as a gold mine of information for determining product and brand positioning, and more and more companies look for ways of digging this gold mine for nuggets of knowledge that they can then bring to bear in decision making. We present a software system, called StarTrack, that automatically rates a product review according to a number of "stars," i.e., according to how positive it is. In other words, given a text-only review (i.e., one with no explicit star-rating attached), StarTrack attempts to guess the star-rating that the reviewer would have attached to the review. StarTrack is thus useful for analysing unstructured word-of-mouth on products, such as the comments and reviews about products that are to be found in spontaneous discussion forums, such as newsgroups, blogs, and the like. StarTrack is based on machine learning technology, and as such does not require any re-programming for porting it from one product domain to another. Based on the star-ratings it attributes to reviews, StarTrack can subsequently rank the products in a given set according to how favourably they have been reviewed by consumers. We present controlled experiments in which we evaluate, on two large sets of product reviews crawled from the Web, the accuracy of StarTrack at (i) star-rating reviews, and (ii) ranking the reviewed products based on the automatically attributed star-ratings.

show abstract

Section: The Internals Of Startrack: Learning and Feature Selectionmentioning

confidence: 99%

Section: The Internals Of Startrack: Sentiment-based Feature Extractionmentioning

confidence: 99%

Section: §1 Introductionmentioning

confidence: 99%

See 1 more Smart Citation

StarTrack: The Next Generation (of Product Review Management Tools)

2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the first one, we use SentiWordNet 3.0 (Baccianella et al, 2010) to obtain the sentiment scores of each word. We used the word and a tag representing the POS tag of the word to output the sentiment score of the word.…”

Section: Features Based On Sentiment Scoresmentioning

confidence: 99%

Effect of Using Regression on Class Confidence Scores in Sentiment Analysis of Twitter Data

Önal¹,

Ertugrul²,

Cakici³

2014

Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

View full text Add to dashboard Cite

In this study, we aim to test our hypothesis that confidence scores of sentiment values of tweets aid in classification of sentiment. We used several feature sets consisting of lexical features, emoticons, features based on sentiment scores and combination of lexical and sentiment features. Since our dataset includes confidence scores of real numbers in [0-1] range, we employ regression analysis on each class of sentiments. We determine the class label of a tweet by looking at the maximum of the confidence scores assigned to it by these regressors. We test the results against classification results obtained by converting the confidence scores into discrete labels. Thus, the strength of sentiment is ignored. Our expectation was that taking the strength of sentiment into consideration would improve the classification results. Contrary to our expectations, our results indicate that using classification on discrete class labels and ignoring sentiment strength perform similar to using regression on continuous confidence scores.

show abstract

“…So far, some feature evaluation algorithms have been developed for monotonic classification 31,32,33,34 . Dominance-based rough set approach(DRSA) was firstly introduced by Greco, Matarazzo and Slowinski, where classical indiscernibility relations were replaced with dominance relations 1,35 .…”

Section: Introductionmentioning

confidence: 99%

Feature selection for monotonic classification via maximizing monotonic dependency

Pan¹,

Hu²,

Song³

et al. 2014

IJCIS

View full text Add to dashboard Cite

Monotonic classification is a special task in machine learning and pattern recognition. As to monotonic classification, it is assumed that both features and decision are ordinal and there is the monotonicity constraints between the features and decision. Little work has been focused on feature selection for this type of tasks although a number of feature selection algorithms have been introduced for nominal classification problems. However these techniques can not be applied to monotonic classification as they do not consider the monotonicity constraints. In this work, we present a technique to compute the quality of features for monotonic classification. Using gradient directing search method, this method trains a feature weight vector by maximizing the fuzzy monotonic dependency, which was defined in fuzzy preference rough sets. We conduct some experiments to compare the classification performances of the proposed method with some other techniques. The experimental results show the effectiveness of the proposed algorithm.

show abstract

Feature selection for ordinal regression

Cited by 108 publications

References 9 publications

StarTrack: The Next Generation (of Product Review Management Tools)

StarTrack: The Next Generation (of Product Review Management Tools)

Effect of Using Regression on Class Confidence Scores in Sentiment Analysis of Twitter Data

Feature selection for monotonic classification via maximizing monotonic dependency

Contact Info

Product

Resources

About