A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment

Esuli, Andrea; Molinari, Alessio; Sebastiani, Fabrizio

doi:10.1145/3433164

Cited by 17 publications

(4 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…HDy seeks for the mixture parameter α ∈ [0, 1] that minimizes the HD between (a) the mixture distribution of posteriors from the positive class (weighted by α) and from the negative class (weighted by (1 − α)), and (b) the unlabelled distribution. -The Saerens-Latinne-Decaestecker algorithm (SLD) [42] (see also [11]):…”

Section: Baselinesmentioning

confidence: 99%

“…This is a method based on Expectation Maximization, whereby the posterior probabilities returned by a soft classifier s for data items in an unlabelled set U , and the class prevalence values for U , are iteratively updated in a mutually recursive fashion. For SLD we calibrate the classifier since, for reasons discussed in [11], this yields an advantage for this method. 5 -QuaNet [12]: This is a deep learning architecture for quantification that predicts class prevalence values by taking as input (i) the class prevalence values as estimated by CC, ACC, PCC, PACC, SLD; (ii) the posterior probabilities Pr(y|x) for the positive class (since QuaNet is a binary method) for each document x, and (iii) embedded representations of the documents.…”

Section: Baselinesmentioning

confidence: 99%

See 1 more Smart Citation

A Concise Overview of LeQua@CLEF 2022: Learning to Quantify

Esuli

Moreo

Sebastiani

et al. 2022

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest Y = {y1, ..., yn} in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting; this is the first time that an evaluation exercise solely dedicated to quantification is organized. For both the binary setting and the single-label multiclass setting, data were provided to participants both in ready-made vector form and in raw document form. In this overview article we describe the structure of the lab, we report the results obtained by the participants in the four proposed tasks and subtasks, and we comment on the lessons that can be learned from these results.The LeQua 2022 lab (https://lequa2022.github.io/) at CLEF 2022 has a "shared task" format; it is a new lab, in two important senses:-No labs on LQ have been organized before at CLEF conferences.-Even outside the CLEF conference series, quantification has surfaced only episodically in previous shared tasks. The first such shared task was Se-mEval 2016 Task 4 "Sentiment Analysis in Twitter" [37], which comprised a binary quantification subtask and an ordinal quantification subtask (these two subtasks were offered again in the 2017 edition). Quantification also featured in the Dialogue Breakdown Detection Challenge [23], in the Dialogue Quality subtasks of the NTCIR-14 Short Text Conversation task [46], and in the NTCIR-15 Dialogue Evaluation task [47]. However, quantification was never the real focus of these tasks. For instance, the real focus of the tasks described in [37] was sentiment analysis on Twitter data, to the point that almost all participants in the quantification subtasks used the trivial "classify and count" method, and focused, instead of optimising the quantification component, on optimising the sentiment analysis component, or on picking the best-performing learner for training the classifiers used by "classify and count". Similar considerations hold for the tasks discussed in [23,46,47].

show abstract

Section: Baselinesmentioning

confidence: 99%

Section: Baselinesmentioning

confidence: 99%

A Concise Overview of LeQua@CLEF 2022: Learning to Quantify

Esuli

Moreo

Sebastiani

et al. 2022

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…The Saerens-Latinne-Decaestecker (SLD) algorithm [28,7] (sometimes also called EMQ, for Expectation Maximization Quantifier) is a probabilistic quantifier-generating method. SLD consists of using the well-known Expectation Maximization algorithm to iteratively update the posterior probabilities generated by a probabilistic classifier and the class prevalence estimates obtained via maximum-likelihood estimation, in a mutually recursive way, until convergence.…”

Section: The Saerens-latinne-decaestecker Algorithmmentioning

confidence: 99%

“…The quantifiers based on Explicit Loss Minimization (ELM) represent a family of methods based on structured output learning; these quantifiers rely on classifiers that have been optimized using a quantification-oriented loss measure. QuaPy implements the following ELM-based methods, all relying on Joachims' SVM perf structured output learning algorithm [18]: 7 • SVM(Q), which attempts to minimize the Q loss, that combines a classification-oriented loss and a quantification-oriented loss, as proposed in [1];…”

Section: Quantifiers Based On Explicit Loss Minimizationmentioning

confidence: 99%

QuaPy: A Python-Based Framework for Quantification

Moreo¹,

Esuli²,

Sebastiani³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that estimates the relative frequencies (a.k.a. prevalence values) of the classes of interest in a sample of unlabelled data. While quantification can be trivially performed by applying a standard classifier to each unlabelled data item and counting how many data items have been assigned to each class, it has been shown that this "classify and count" method is outperformed by methods specifically designed for quantification. QuaPy provides implementations of a number of baseline methods and advanced quantification methods, of routines for quantificationoriented model selection, of several broadly accepted evaluation measures, and of robust evaluation protocols routinely used in the field. QuaPy also makes available datasets commonly used for testing quantifiers, and offers visualization tools for facilitating the analysis and interpretation of the results.The software is open-source and publicly available under a BSD-3 licence via GitHub 2 , and can be installed via pip 3 .

show abstract

Re-assessing the “Classify and Count” Quantification Method

Moreo

Sebastiani

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment

Cited by 17 publications

References 35 publications

A Concise Overview of LeQua@CLEF 2022: Learning to Quantify

A Concise Overview of LeQua@CLEF 2022: Learning to Quantify

QuaPy: A Python-Based Framework for Quantification

Re-assessing the “Classify and Count” Quantification Method

Contact Info

Product

Resources

About