2022
DOI: 10.48550/arxiv.2203.03489
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DATGAN: Integrating expert knowledge into deep learning for synthetic tabular data

Abstract: Synthetic data can be used in various applications, such as correcting bias datasets or replacing scarce original data for simulation purposes. Generative Adversarial Networks (GANs) are considered stateof-the-art for developing generative models. However, these deep learning models are data-driven, and it is, thus, difficult to control the generation process. It can, therefore, lead to the following issues: lack of representativity in the generated data, the introduction of bias, and the possibility of overfi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 46 publications
0
7
0
Order By: Relevance
“…• DATGAN [31] proposes DATGAN which is a novel architecture based on GAN using Directed Acyclic Graphs (DAGs) to model the information about the dataset. It uses LSTM cells to model expert knowledge using DAG.…”
Section: B Machine Learning Methods-based Modelsmentioning
confidence: 99%
“…• DATGAN [31] proposes DATGAN which is a novel architecture based on GAN using Directed Acyclic Graphs (DAGs) to model the information about the dataset. It uses LSTM cells to model expert knowledge using DAG.…”
Section: B Machine Learning Methods-based Modelsmentioning
confidence: 99%
“…It is difficult for GAN to control the generation process of data-driven systems; therefore, integrating prior knowledge about data relationships and constraints can assist the generator in generating synopses that are realistic and meaningful. In order to implement this, DATGAN [38] incorporates expert knowledge into the GAN generator by matching the generator structure to the underlying data structure using a Directed Acyclic Graph (DAG). Using DAG, the nodes represent the columns of a data table, while the directed links between them allow the generator to determine the relationship between variables so that one column's generation influences another.…”
Section: Gan-based Tabular Generatormentioning
confidence: 99%
“…However, in AQP, it is not necessary to meet this threshold in order to generate realistic data synopses. DATGAN [38] uses the improved version of the Wasserstein loss function in WGAN [41] in addition to the Vanilla GAN loss function with a gradient penalty [42] and also adds the KL-divergence as an extra term to the original loss function. Both of these terms aim to minimize the difference between the probability distributions of real and generated data.…”
Section: Distribution Matchingmentioning
confidence: 99%
See 1 more Smart Citation
“…They focus on the generation of high-dimensional discrete variables (binary and count features). Lederrey et al [ 34 ] proposed DATGAN model to generate population data. They combined expertise and deep learning methods and used directed acyclic graph to identify the relationships between variables.…”
Section: Preliminariesmentioning
confidence: 99%