2019 IEEE International Conference on Big Data (Big Data) 2019
DOI: 10.1109/bigdata47090.2019.9005476
|View full text |Cite
|
Sign up to set email alerts
|

Utility and Privacy Assessments of Synthetic Data for Regression Tasks

Abstract: With ever increasing capacity for collecting, storing, and processing of data, there is also a high demand for intelligent data analysis methods. While there have been impressive advances in machine learning and similar domains in recent years, this also gives rise to concerns regarding the protection of personal and otherwise sensitive data, especially if it is to be analysed by third parties. Besides anonymisation, which becomes challenging with high dimensional data, one approach for privacy-preserving data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(27 citation statements)
references
References 13 publications
0
26
0
1
Order By: Relevance
“…Narrow or specific measures are widely used for assessing synthetic data [15], [19], [20], [27], [31], [32]. They are useful when the analysis to be performed on the synthetic data is known ahead of time.…”
Section: B Utility Metrics: Overview and Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…Narrow or specific measures are widely used for assessing synthetic data [15], [19], [20], [27], [31], [32]. They are useful when the analysis to be performed on the synthetic data is known ahead of time.…”
Section: B Utility Metrics: Overview and Classificationmentioning
confidence: 99%
“…Pairwise Correlations are sometimes measured using pairwise correlation plots such as heat maps [27], [32], but more often using statistical measures such as pairwise correlation difference (đ‘ƒđ¶đ·) [18]. We assess the correlations between attribute pairs using the latter.…”
Section: ) Bivariate Fidelitymentioning
confidence: 99%
“…A way to ensure this from a mathematical perspective is to train the generative models with a differential privacy (DP) objective. The premise of DP is that no output could be directly attributed to a single training instance [2,7,19,35]. In this study, we consciously chose not to include DP to maximize the utility of the synthetic corpora for the downstream task, but we recommend that future research uses DP in order to minimize privacy risks.…”
Section: Privacy Of Synthetic Textmentioning
confidence: 99%
“…Dataset yang digunakan pada penelitian ini adalah data Boston Housing, yaitu data mengenai housing market di kota Boston, Amerika Serikat yang dikumpulkan oleh Statlib Library of Carnegie Mellon University [5]. Dataset ini sering dipakai pada penelitian mengenai data mining seperti pada penelitian prediksi housing prices [6] dan regression [7]. Algoritma K-Means diimplementasikan pada framework R menggunakan library.…”
Section: Pendahuluanunclassified