Computational fluid dynamics models based on Reynolds-averaged Navier-Stokes equations with turbulence closures still play important roles in engineering design and analysis. However, the development of turbulence models has been stagnant for decades. With recent advances in machine learning, data-driven turbulence models have become attractive alternatives worth further explorations. However, a major obstacle in the development of data-driven turbulence models is the lack of training data. In this work, we survey currently available public turbulent flow databases and conclude that they are inadequate for developing and validating data-driven models. Rather, we need more benchmark data from systematically and continuously varied flow conditions (e.g., Reynolds number and geometry) with maximum coverage in the parameter space for this purpose.To this end, we perform direct numerical simulations of flows over periodic hills with varying slopes, resulting in a family of flows over periodic hills which ranges from incipient to mild and massive separations. We further demonstrate the use of such a dataset by training a machine learning model that predicts Reynolds stress anisotropy based on a set of mean flow features. We expect the generated dataset, along with its design methodology and the example application presented herein, will facilitate development and comparison of future data-driven turbulence models. entists and engineers. Turbulent flows are typical multi-scale physical systems that are characterized by a wide range of spatial and temporal scales. When predicting such systems, first-principle-based simulations are prohibitively expensive, and the small-scale processes must be modeled. For simulating turbulent flows, this is done by solving the Reynolds-Averaged Navier-Stokes (RANS) equations with the unresolved processes represented by turbulence model closures.While the past two decades have witnessed a rapid development of high-fidelity turbulence simulation methods such as large eddy simulations (LES), they are still too expensive for practical systems such as the flow around an commercial airplane [1]. It is expect that using LES for engineering design will remain infeasible for decades to come. On the other hand, attempts in combining models of different fidelity levels (e.g., hybrid LES/RANS models) have shown promises, but how to achieve consistencies in the hierarchical coupling of models is still a challenge and a topic of ongoing research. Consequently, Reynolds-Averaged Navier-Stokes (RANS) equations are still the workhorse tool in engineering computational fluid dynamics for simulating turbulent flows.It is well known that RANS turbulence models have large model-form uncertainties for a wide range of flows [2], which diminish the predictive capabilities of the RANS-based CFD models.Development of turbulence models has been stagnant for decades, which is evident from the fact that currently used turbulence models (e.g., k-ε, k-ω, and Spalart-Allmaras models [3-5]) were all developed decades ag...