Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards

Tay, Sebastian; Xu, Xiaowei; Foo, Chuan Sheng; Low, Bryan Kian Hsiang

doi:10.48550/arxiv.2112.09327

Cited by 1 publication

(12 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where F is the class of functions f in the unit ball of the reproducing kernel Hilbert space associated with a kernel function k. We defer the discussion on kernels appropriate for use with MMD to (Tay et al 2021) b (F , S, T ) of the squared MMD can be obtained in the form of matrix Frobenius inner products, as shown in (Gretton et al 2012):…”

Section: Data Valuation With Maximum Mean Discrepancy (Mmd)mentioning

confidence: 99%

“…which is a reasonable choice for our problem setting under the following practical assumptions: (A) Every party benefits from having data drawn from D besides having just its dataset D i since D i may only be sampled from a restricted subset of the support of D. We discuss its validity in (Tay et al 2021).…”

Section: Data Valuation With Maximum Mean Discrepancy (Mmd)mentioning

confidence: 99%

“…Given that v c is non-negative and monotonically increasing (a later section will show sufficient conditions that guarantee these properties), the reward scheme of Sim et al (2020) exploits the notion of ρ-Shapley fair reward values r i := (ϕ i /ϕ * ) ρ × v c (N ) for each party i ∈ N with an adjustable parameter ρ to trade off between satisfying the incentives. For your convenience, we reproduce their main result and full definitions in (Tay et al 2021).…”

Section: Reward Scheme For Guaranteeing Incentives In Cgm Frameworkmentioning

confidence: 99%

“…This formulation also informs us of a suitable choice of the synthetic dataset G: A sufficient but not necessary condition for the feasible set of the LP to be non-empty is min i∈N v max i ≥ max i∈N v min i . When generating the synthetic dataset G, we may thus increase the size of G until this condition is satisfied; we provide an intuition for why this works in (Tay et al 2021).…”

Section: A Modified Reward Scheme With Rectified ρ-Shapley Fair Rewar...mentioning

confidence: 99%

“…In each iteration of our weighted sampling algorithm for distributing synthetic data reward to party i (Algo. 1) in (Tay et al 2021)), we firstly perform min-max normalization to rescale ∆ x to ∆x for all synthetic data points x ∈ G \ G i to lie within the [0, 1] interval. We compute the probability of each synthetic data point x being sampled using the softmax function: p(x) = exp (β ∆x )/ x ′ ∈G\Gi exp (β ∆x ′ ) where β ∈ [0, ∞) is the inverse temperature hyperparameter.…”

Section: Distributing Synthetic Data Rewards To Parties Via Weighted ...mentioning

confidence: 99%

See 4 more Smart Citations

Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards

Tay

Foo

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

This paper presents a novel collaborative generative modeling (CGM) framework that incentivizes collaboration among self-interested parties to contribute data to a pool for training a generative model (e.g., GAN), from which synthetic data are drawn and distributed to the parties as rewards commensurate to their contributions. Distributing synthetic data as rewards (instead of trained models or money) offers task- and model-agnostic benefits for downstream learning tasks and is less likely to violate data privacy regulation. To realize the framework, we firstly propose a data valuation function using maximum mean discrepancy (MMD) that values data based on its quantity and quality in terms of its closeness to the true data distribution and provide theoretical results guiding the kernel choice in our MMD-based data valuation function. Then, we formulate the reward scheme as a linear optimization problem that when solved, guarantees certain incentives such as fairness in the CGM framework. We devise a weighted sampling algorithm for generating synthetic data to be distributed to each party as reward such that the value of its data and the synthetic data combined matches its assigned reward value by the reward scheme. We empirically show using simulated and real-world datasets that the parties' synthetic data rewards are commensurate to their contributions.

show abstract

Section: Data Valuation With Maximum Mean Discrepancy (Mmd)mentioning

confidence: 99%

Section: Data Valuation With Maximum Mean Discrepancy (Mmd)mentioning

confidence: 99%

Section: Reward Scheme For Guaranteeing Incentives In Cgm Frameworkmentioning

confidence: 99%

Section: A Modified Reward Scheme With Rectified ρ-Shapley Fair Rewar...mentioning

confidence: 99%

Section: Distributing Synthetic Data Rewards To Parties Via Weighted ...mentioning

confidence: 99%

See 3 more Smart Citations