2021
DOI: 10.1101/2021.10.26.465952
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Techniques to Produce and Evaluate Realistic Multivariate Synthetic Data

Abstract: BackgroundProper data modeling in biomedical research requires sufficient data for exploration and reproducibility purposes. A limited sample size can inhibit objective performance evaluation.ObjectiveWe are developing a synthetic population (SP) generation technique to address the limited sample size condition. We show how to estimate a multivariate empirical probability density function (pdf) by converting the task to multiple one-dimensional (1D) pdf estimations.MethodsKernel density estimation (KDE) in 1D … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 44 publications
(118 reference statements)
0
2
0
Order By: Relevance
“…In this report, we present modifications to our method to mitigate the mKDE efficiency problem under specific conditions (latent normality) and address synthetic data generation in relatively higher dimensionality (d = 10). This modified approach decomposes an arbitrarily distributed multivariate problem into multiple univariate KDE (uKDE) problems while characterizing the covariance structure independently 26 . We are evaluating whether this approach can transform an arbitrarily distributed multivariate sample into an approximate multivariate normal form, which we define as a sample with a latent normal characteristic .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this report, we present modifications to our method to mitigate the mKDE efficiency problem under specific conditions (latent normality) and address synthetic data generation in relatively higher dimensionality (d = 10). This modified approach decomposes an arbitrarily distributed multivariate problem into multiple univariate KDE (uKDE) problems while characterizing the covariance structure independently 26 . We are evaluating whether this approach can transform an arbitrarily distributed multivariate sample into an approximate multivariate normal form, which we define as a sample with a latent normal characteristic .…”
Section: Introductionmentioning
confidence: 99%
“…By hypothesis, our approach seeks to extend these straightforward techniques to the latent normal class by determining when (or if) it exists. Developing the analytics to detect this condition and then leveraging it to generate synthetic data are essential elements of our work 26 .…”
Section: Introductionmentioning
confidence: 99%