2015
DOI: 10.1145/2700406
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Text Quantifiers for Multivariate Loss Functions

Abstract: We address the problem of quantification, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or prevalence) of the class in a dataset of unlabeled items. Quantification has several applications in data and text mining, such as estimating the prevalence of positive reviews in a set of reviews of a given product or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by le… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
94
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 62 publications
(94 citation statements)
references
References 55 publications
0
94
0
Order By: Relevance
“…It is inconclusive as to which quantification approach is better. PCC outperformed CC in (Bella et al, 2010) but underperformed CC in (Esuli and Sebastiani, 2015). Following the results from (Gao and Sebastiani, 2016), which are reported on sentiment analysis in twitter, we decided to use PCC for both of our 3 ρ is the average recall and F 1 pn the macro-average F1 score of the positive and negative classes quantification submissions.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…It is inconclusive as to which quantification approach is better. PCC outperformed CC in (Bella et al, 2010) but underperformed CC in (Esuli and Sebastiani, 2015). Following the results from (Gao and Sebastiani, 2016), which are reported on sentiment analysis in twitter, we decided to use PCC for both of our 3 ρ is the average recall and F 1 pn the macro-average F1 score of the positive and negative classes quantification submissions.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…The scenario in [Esuli and Sebastiani 2015] is better. They employ RCV1-v2, a multi-label text classification benchmark.…”
Section: Experimental Designsmentioning
confidence: 99%
“…Thus, model assessment processes require groups of samples with sufficient variability in order to provide precise error estimates. Esuli and Sebastiani [2015] employ a quantifier based on optimizing multivariate loss functions, see Section 7.3, in the context of text classification. In their experiments, this approach outperforms other quantification algorithms in a multi-class dataset with a large number of classes (99).…”
Section: Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Indeed, KLD is the most frequently used measure for evaluating quantification (see e.g., [3,10,11,12]). Note that KLD is non-decomposable, i.e., the error we make by estimating p viap cannot be broken down into item-level errors.…”
Section: Introductionmentioning
confidence: 99%