Nowadays, several computer devices are used to visually detect objects, people and activities. Their quality and performance depends on limited datasets created and annotated by error-prone and expensive human handwork. But to reach high quality for complex detection tasks extensive datasets with errorless annotations are needed. To overcome this dilemma we create a system for automatic generation of synthetic ground truth data to allow learning of complex detection tasks as well as testing, verification and evaluation.