Text extraction plays an important function for data processing work ows in digital libraries. For example, it is a crucial prerequisite for evaluating the quality of migrated textual documents. Complex le formats make the extraction process error-prone and have made it very challenging to verify the correctness of extraction components. Based on digital preservation and information retrieval scenarios, three quality requirements in terms of e ectiveness of text extraction tools are identi ed: 1) is a certain text snippet correctly extracted from a document, 2) does the extracted text appear in the right order relative to other elements and, 3) is the structure of the text preserved. A number of text extraction tools is available ful lling these three quality requirements to various degrees. However, systematic benchmarks to evaluate those tools are still missing, mainly due to the lack of datasets with accompanying ground truth. e contribution of this paper is twofold. First we describe a dataset generation method based on model driven engineering principles and use it to synthesize a dataset and its ground truth directly from a model. Second, we de ne a benchmark for text extraction tools and complete an experiment to calculate performance measures for several tools that cover the three quality requirements. e results demonstrate the bene ts of the approach in terms of scalability and e ectiveness in generating ground truth for content and structure of text elements.