Close process monitoring (i.e., detection and identification of disturbances) is important to achieve high process efficiency and safety. The Tennessee Eastman process is an extensive benchmark dataset for fault detection and identification, but it is only representative for continuous processes because it does not contain the inherent nonstationarity that complicates monitoring of batch processes. Nevertheless, batch processes also play an important role in many types of industry. This paper therefore presents an extensive reference dataset for benchmarking data-driven methodologies for fault detection and identification in batch processes. The original Pensim model [10] is expanded with sensor noise. By changing the properties of the initial conditions and/or model parameters, four subsets of different complexity are generated, each containing 400 batches with normal operation. To correctly assess the fault detection and identification in batch processes, 15 faults are simulated with various amplitudes and onset times for a total of 22,200 faulty batches for each subset, or 90,400 batches in total. Analysis of the data indicates that the presented types of process faults and their various amplitudes in each of the four subsets present a suitable benchmark for fault detection and identification in batch processes. The dataset is freely available at http://cit.kuleuven.be/biotec/batchbenchmark.