2022
DOI: 10.5731/pdajpst.2021.012659
|View full text |Cite
|
Sign up to set email alerts
|

Systematic Design, Generation, and Application of Synthetic Datasets for Flow Cytometry

Abstract: Application of synthetic datasets in training and validation of analysis tools have led to improvements in many decision-making tasks in a range of domains from computer vision to digital pathology. Synthetic datasets overcome the constraints of real-world datasets, namely difficulties in collection and labelling, expense, time and privacy concerns.In flow cytometry, real cell-based datasets are limited by properties such as size, number of parameters, distance between cell populations and distributions, and a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…Our previous work on the inter-comparison between synthetic and a real dataset showed clear correlation among cell distribution characteristics examples [ 21 ]. Consequently, for this particular cross-platform comparison we are confident that the synthetic data mirrors, to an appropriate level, the key characteristics of low dimensionality cluster data, demonstrates design flexibility and application, and allows for traceable benchmarking (absolute accuracy and repeatability), without the further need to run the platforms through further real data.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Our previous work on the inter-comparison between synthetic and a real dataset showed clear correlation among cell distribution characteristics examples [ 21 ]. Consequently, for this particular cross-platform comparison we are confident that the synthetic data mirrors, to an appropriate level, the key characteristics of low dimensionality cluster data, demonstrates design flexibility and application, and allows for traceable benchmarking (absolute accuracy and repeatability), without the further need to run the platforms through further real data.…”
Section: Discussionmentioning
confidence: 99%
“…A recognised limitation of this work is that the number of markers simulated is lower than those in real data (usually -colour panels) because a priority in this study has been to understand and benchmark how algorithms behave with two or three clusters before introducing further complexities into the datasets. Noting the successful referencing and correlation study we have already completed between synthetic and real data [ 21 ], overall, real data have been excluded from this initial research because they are significantly more complex, containing sources of variation from upstream processes and noise components that cannot be controlled to transparently understand the ’black box’ nature of the algorithms investigated. Additionally, it is very difficult to achieve absolute cell counts for real data, so defining measurement accuracy (a critical component of this study) would not be possible.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations