Optimization methods have been widely applied to the aerodynamic design of gas turbine blades. While applying optimization to high-fidelity computational fluid dynamics (CFD) simulations has proven capable of improving engineering design performance, a challenge has been overcoming the prolonged run-time due to the computationally expensive CFD runs. Reduced-order models and, more recently, machine learning methods have been increasingly used in gas turbine studies to predict performance metrics and operational characteristics, model turbulence, and optimize designs. The application of machine learning methods allows for utilizing existing knowledge and datasets from different sources, such as previous experiments, CFD, low-fidelity simulations, 1D or system-level studies. The present study investigates inserting a machine learning model that utilizes such data into a high-fidelity CFD driven optimization process, and hence effectively reduces the number of required evaluations of the CFD model. Artificial Neural Network (ANN) models were trained on data from over three thousand two-dimensional (2D) CFD analyses of turbine blade cross-sections. The trained ANN models were then used as surrogates in a nested optimization process alongside a full three-dimensional Navier–Stokes CFD simulation. The much lower evaluation cost of the ANN model allows for tens of thousands of design evaluations to guide the search of the best blade profiles to be used in the more expensive, high-fidelity CFD runs, improving the progress of the optimization while reducing the required computation time. It is estimated that the current workflow achieves a five-fold reduction in computational time in comparison to an optimization process that is based on three-dimensional (3D) CFD simulations alone. The methodology is demonstrated on the NASA/General Electric Energy Efficient Engine (E3) high pressure turbine blade and found Pareto front designs with improved blade efficiency and power over the baseline. Quantitative analysis of the optimization data reveals that some design parameters in the present study are more influential than others, such as the lean angle and tip scaling factor. Examining the optimized designs also provides insight into the physics, showing that the optimized designs have a lower amount of pressure drop near the trailing edge, but have an earlier onset of pressure drop on the suction side surface when compared to the baseline design, contributing to the observed improvements in efficiency and power.