2022
DOI: 10.48550/arxiv.2203.03570
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Kubric: A scalable dataset generator

Abstract: Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control ove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 77 publications
(113 reference statements)
0
6
1
Order By: Relevance
“…Deep learning-based computer vision methods promise to fundamentally alter what is possible in animal behavioural research [12][13][14][15][16][17][18]. A key remaining bottleneck is the "data-hunger" of supervised learning techniques: annotated datasets of the size and variability required to achieve robust, domain-invariant performance are rarely available, and in any case time-intensive to produce [36,54]. One strategy to overcome this limitation is to produce annotated data synthetically, using sufficiently realistic computer simulations [31,32,34,36].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Deep learning-based computer vision methods promise to fundamentally alter what is possible in animal behavioural research [12][13][14][15][16][17][18]. A key remaining bottleneck is the "data-hunger" of supervised learning techniques: annotated datasets of the size and variability required to achieve robust, domain-invariant performance are rarely available, and in any case time-intensive to produce [36,54]. One strategy to overcome this limitation is to produce annotated data synthetically, using sufficiently realistic computer simulations [31,32,34,36].…”
Section: Discussionmentioning
confidence: 99%
“…A key remaining bottleneck is the "data-hunger" of supervised learning techniques: annotated datasets of the size and variability required to achieve robust, domain-invariant performance are rarely available, and in any case time-intensive to produce [36,54]. One strategy to overcome this limitation is to produce annotated data synthetically, using sufficiently realistic computer simulations [31,32,34,36]. In order to facilitate this process, we developed replicAnt: a synthetic data generator built in Unreal Engine 5 and Python.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate our model on five datasets. Four of them are synthetic multi-object datasets-CLEVR [35], CLEVRTex [37], MOVi-C, MOVi-E [24]. They present increasing levels of difficulty-CLEVRTex adds texture to objects and backgrounds, MOVi-C uses more complex objects and natural backgrounds, and MOVi-E contains large numbers of objects (up to 23) per scene.…”
Section: Methodsmentioning
confidence: 99%
“…A particular focus is placed on the acknowledgment of the simulation-to-real gap and how to tackle this particular challenge in the dataset generation process. Even though the first version of BlenderProc was one of the first tools to generate photo-realistic, synthetic datasets, many more tools exist nowadays, compared in Table 1 (Greff et al, 2022;Manolis Savva* et al, 2019;Morrical et al, 2021;Schwarz & Behnke, 2020;To et al, 2018). In contrast to the first version of BlenderProc, BlenderProc2 relies on an easy-to-use python API, whereas the first version used a YAML-based configuration approach (Denninger et al, 2019(Denninger et al, , 2020.…”
Section: Statement Of Needmentioning
confidence: 99%