In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.