Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Qaraei, Mohammadreza; Schultheis, Erik; Gupta, Priya; Babbar, Rohit

doi:10.1145/3442381.3450139

Cited by 15 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we empirically show the usefulness of the proposed plug-in approach by incorporating it into BR and PLT algorithms and comparing these algorithms to their vanilla versions and stateof-the-art methods, particularly those that focus on tail-labels performance: PFastreXML [11], ProXML [4], a variant of DiSMEC [3] with a re-balanced and unbiased loss function as implemented in PW-DiSMEC [20] (class-balanced variant), and Parabel [18]. We conduct a comparison on six well-established XMLC benchmark datasets from the XMLC repository [6], for which we use the original train and test splits.…”

Section: Resultsmentioning

confidence: 99%

Propensity-scored Probabilistic Label Trees

Wydmuch

Jasinska-Kobus

Babbar

et al. 2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels. Recently, XMLC has been widely applied to diverse web applications such as automatic content labeling, online advertising, or recommendation systems. In such environments, label distribution is often highly imbalanced, consisting mostly of very rare tail labels, and relevant labels can be missing. As a remedy to these problems, the propensity model has been introduced and applied within several XMLC algorithms. In this work, we focus on the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems. We introduce an inference procedure, based on the * -search algorithm, that efficiently finds the optimal solution, assuming that all probabilities and propensities are known. We demonstrate the attractiveness of this approach in a wide empirical study on popular XMLC benchmark datasets. CCS CONCEPTS• Computing methodologies → Supervised learning by classification.

show abstract

Section: Resultsmentioning

confidence: 99%

Propensity-scored Probabilistic Label Trees

Wydmuch

Jasinska-Kobus

Babbar

et al. 2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

show abstract

“…If propensities are known, then they can be used to construct an unbiased, task or surrogate, loss l [35] in the sense that The construction of the unbiased counterpart depends on the form of propensities, e.g., the label-wise propensities (7) are sufficient for losses decomposable over labels [24] like Hamming loss or binary cross-entropy, but might not be for more complex losses without additional assumptions [30]. The unbiased losses can be used in training procedures [18,26] or for estimating the performance of classifiers. For some losses, such as Hamming loss or precision@𝑘, the Bayes classifier can be written as a function of the conditional label distributions 𝜂 𝑗 (𝑥).…”

Section: Missing Labelsmentioning

confidence: 99%

“…For example, decision tree methods can directly use the propensity-scored variants of metrics such as precision@𝑘 or nDCG@𝑘 [18]. Alternatively, one can use unbiased or upper-bounded propensity-scored surrogate losses [26].…”

Section: Empirical Propensity Modelmentioning

confidence: 99%

“…Of course, in XMLC, both interpretations can be combined, i.e., one would like to have a task loss that is adapted to tail labels, but calculate it in a way that takes missing labels into account. The closest to this in the literature is [26], where training uses a loss that combines unbiased estimates and classrebalancing, but still, evaluation is performed using vanilla and propensity-scored metrics, instead of a propensity-scored variant of a tail-weighted metric.…”

Section: The Current Use Of Propensity Metricsmentioning

confidence: 99%

See 1 more Smart Citation

On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Schultheis

Wydmuch

Babbar

et al. 2022

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

The propensity model introduced by Jain et al. [18] has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender systems, that we believe constitute promising alternatives to be followed in XMLC. CCS CONCEPTS• Computing methodologies → Supervised learning by classification.

show abstract

“…for the large dataset from the extreme classification repository [5] (cf. Figures 1 in [31,4,28] for some examples), and for many types of data that is gathered at internet-scale [1].…”

Section: Introductionmentioning

confidence: 99%

Speeding-up One-vs-All Training for Extreme Classification via Smart Initialization

Schultheis¹,

Babbar²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper we show that a simple, data dependent way of setting the initial vector can be used to substantially speed up the training of linear one-versus-all (OVA) classifiers in extreme multi-label classification (XMC). We discuss the problem of choosing the initial weights from the perspective of three goals. We want to start in a region of weight space a) with low loss value, b) that is favourable for second-order optimization, and c) where the conjugate-gradient (CG) calculations can be performed quickly. For margin losses, such an initialization is achieved by selecting the initial vector such that it separates the mean of all positive (relevant for a label) instances from the mean of all negatives -two quantities that can be calculated quickly for the highly imbalanced binary problems occurring in XMC. We demonstrate a speedup of ≈ 3× for training with squared hinge loss on a variety of XMC datasets. This comes in part from the reduced number of iterations that need to be performed due to starting closer to the solution, and in part from an implicit negative mining effect that allows to ignore easy negatives in the CG step. Because of the convex nature of the optimization problem, the speedup is achieved without any degradation in classification accuracy.Preprint. Under review.

show abstract

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Cited by 15 publications

References 25 publications

Propensity-scored Probabilistic Label Trees

Propensity-scored Probabilistic Label Trees

On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Speeding-up One-vs-All Training for Extreme Classification via Smart Initialization

Contact Info

Product

Resources

About