Quantification via Probability Estimators

Bella, Antonio; Ferri, Cèsar; Hernández-Orallo, José; Ramírez-Quintana, Marïa José

doi:10.1109/icdm.2010.75

Cited by 102 publications

(145 citation statements)

References 5 publications

Supporting

Mentioning

144

Contrasting

Unclassified

Order By: Relevance

“…In (Bella et al, 2010) a probabilistic version of AC is developed. First the authors introduce a simple method called Probability Average (PA), which is clearly aligned with CC.…”

Section: Other Quantification Methodsmentioning

confidence: 99%

Why is quantification an interesting learning problem?

et al. 2016

View full text Add to dashboard Cite

There are real applications that do not demand to classify or to make predictions about individual objects, but to estimate some magnitude about a group of them. For instance, one of these cases happen in sentiment analysis and opinion mining. Some applications require to classify opinions as positives or negatives, but there are also others, even more useful sometimes, that just need an estimation of which is the proportion of each class during a concrete period of time. "How many tweets about our new product were positive yesterday?" Practitioners should apply quantification algorithms to tackle this kind of problems, instead of just using off-the-shelf classification methods because classifiers are suboptimal in the context of quantification tasks. Unfortunately, quantification learning is still relatively an under explored area in machine learning. The goal of this paper is to show that quantification learning is an interesting open problem. In order to support its benefits, we shall show an application to analyze Twitter comments in which even the most simple quantification methods outperform classification approaches.

show abstract

“…In (Bella et al, 2010) a probabilistic version of AC is developed. First the authors introduce a simple method called Probability Average (PA), which is clearly aligned with CC.…”

Section: Other Quantification Methodsmentioning

confidence: 99%

Why is quantification an interesting learning problem?

et al. 2016

View full text Add to dashboard Cite

show abstract

“…For this reason, most of the experiments reported in literature employ datasets taken from other problems, like classification or regression, depending on the quantification learning task studied, see for instance [Forman 2008;Bella et al 2010;Barranquero et al 2015]. In all these cases, the authors create drifted testing sets artificially.…”

Section: Experimental Designsmentioning

confidence: 99%

“…Mean Squared Error (MSE) is preferred for some authors [Bella et al 2010;Amati et al 2014b;Asoh et al 2012] over MAE. The differences between both is that MAE is more robust to outliers and it is more intuitive and easier to interpret than MSE, and the advantage of MSE is that it does not assign equal weight to all mistakes, emphasizing the extreme values whose consequences may be much bigger than the equivalent smaller ones for a particular application.…”

Section: Performance Measures For Binary Quantificationmentioning

confidence: 99%

“…,p l ]. Bella et al [2010] propose two probabilistic variants of the CC/AC methods. The first one, called Probability Average (PA), is the counterpart of CC and will be denoted here as Probabilistic CC (PCC).…”

Section: Multi-class Acmentioning

confidence: 99%

“…The experimental results reported with boar sperm samples using such techniques outperform previous approaches based on classification in terms of several measures, including mean absolute error, KL divergence and mean relative error. Tasche [2014] generalizes Probabilistic Adjusted Count [Bella et al 2010], see Section 6.4, to the multi-class quantification case. This proposal is motivated by the problem of forecasting credit default rate of portfolios during the coming year.…”

Section: Applicationsmentioning

confidence: 99%

See 2 more Smart Citations

A Review on Quantification Learning

et al. 2017

View full text Add to dashboard Cite

The task of quantification consists in providing an aggregate estimation (e.g. the class distribution in a classification problem) for unseen test sets, applying a model that is trained using a training set with a different data distribution. Several real-world applications demand this kind of methods that do not require predictions for individual examples and just focus on obtaining accurate estimates at an aggregate level. During the past few years, several quantification methods have been proposed from different perspectives and with different goals. This paper presents a unified review of the main approaches with the aim of serving as an introductory tutorial for newcomers in the field.

show abstract

Producing plankton classifiers that are robust to dataset shift

Chen,

Kyathanahally,

Reyes

et al. 2024

Limnology & Ocean Methods

View full text Add to dashboard Cite

Modern plankton high‐throughput monitoring relies on deep learning classifiers for species recognition in water ecosystems. Despite satisfactory nominal performances, a significant challenge arises from dataset shift, which causes performances to drop during deployment. In our study, we integrate the ZooLake dataset, which consists of dark‐field images of lake plankton (Kyathanahally et al. 2021a), with manually annotated images from 10 independent days of deployment, serving as test cells to benchmark out‐of‐dataset (OOD) performances. Our analysis reveals instances where classifiers, initially performing well in in‐dataset conditions, encounter notable failures in practical scenarios. For example, a MobileNet with a 92% nominal test accuracy shows a 77% OOD accuracy. We systematically investigate conditions leading to OOD performance drops and propose a preemptive assessment method to identify potential pitfalls when classifying new data, and pinpoint features in OOD images that adversely impact classification. We present a three‐step pipeline: (i) identifying OOD degradation compared to nominal test performance, (ii) conducting a diagnostic analysis of degradation causes, and (iii) providing solutions. We find that ensembles of BEiT vision transformers, with targeted augmentations addressing OOD robustness, geometric ensembling, and rotation‐based test‐time augmentation, constitute the most robust model, which we call BEsT. It achieves an 83% OOD accuracy, with errors concentrated on container classes. Moreover, it exhibits lower sensitivity to dataset shift, and reproduces well the plankton abundances. Our proposed pipeline is applicable to generic plankton classifiers, contingent on the availability of suitable test cells. By identifying critical shortcomings and offering practical procedures to fortify models against dataset shift, our study contributes to the development of more reliable plankton classification technologies.

show abstract

Quantification via Probability Estimators

Cited by 102 publications

References 5 publications

Why is quantification an interesting learning problem?

Why is quantification an interesting learning problem?

A Review on Quantification Learning

Producing plankton classifiers that are robust to dataset shift

Contact Info

Product

Resources

About