2018
DOI: 10.48550/arxiv.1811.10319
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the cost of essentially fair clusterings

Abstract: Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -especially if the data is already biased.At NIPS 2017, Chierichetti et al. [14] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fract… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 30 publications
1
6
0
Order By: Relevance
“…In this work, we initiate a study of a fair variant of the correlation clustering problem where each vertex has a given feature, and the goal is to make sure that the distribution of the features is the same as the global distribution in each cluster. This is the same notion of fairness studied by Chierichetti et al [14], Bercea et al [7] and Bera et al [6] on k-center and k-median. In another variation, the goal is to make sure the number of nodes of a specific feature c i in a cluster of size n is between n qi , n pi where p i ≤ q i ∈ Z ≥1 , and p i , q i are specified per each feature c i .…”
Section: Introductionsupporting
confidence: 68%
See 1 more Smart Citation
“…In this work, we initiate a study of a fair variant of the correlation clustering problem where each vertex has a given feature, and the goal is to make sure that the distribution of the features is the same as the global distribution in each cluster. This is the same notion of fairness studied by Chierichetti et al [14], Bercea et al [7] and Bera et al [6] on k-center and k-median. In another variation, the goal is to make sure the number of nodes of a specific feature c i in a cluster of size n is between n qi , n pi where p i ≤ q i ∈ Z ≥1 , and p i , q i are specified per each feature c i .…”
Section: Introductionsupporting
confidence: 68%
“…The first direction includes results on statistical parity [22], disparate impact [17], and individual fairness [16]. Second direction includes a bulk of work including fair rankings [10], fair clusterings [14,25,7,6,2], fair voting [9], and fair optimization with matroid constraints [15].…”
Section: Related Workmentioning
confidence: 99%
“…For K-means clustering problem, we show that sufficiently large regularization coefficient yields perfect fairness under disparate impact doctrine. Unlike the twophase methods proposed in [13,5,43,7,47], our method does not require any pre-processing step, is scalable, and allows for regulating the trade-off between the clustering quality and fairness.…”
Section: Contributionsmentioning
confidence: 99%
“…For example, we say a predictor Ŷθ satisfies equalized odds condition if the predictor Ŷθ is conditionally independent of the sensitive attribute S given the true label Y . Similar to formulation (7), the equalized odds fairness notion can be achieved by the following min-max problem…”
Section: Binary Casementioning
confidence: 99%
See 1 more Smart Citation