Wind farm design primarily depends on the variability of the wind turbine wake flows to the atmospheric wind conditions, and the interaction between wakes. Physics-based models that capture the wake flow-field with high-fidelity are computationally very expensive to perform layout optimization of wind farms, and, thus, data-driven reduced order models can represent an efficient alternative for simulating wind farms. In this work, we use real-world light detection and ranging (LiDAR) measurements of wind-turbine wakes to construct predictive surrogate models using machine learning. Specifically, we first demonstrate the use of deep autoencoders to find a low-dimensional latent space that gives a computationally tractable approximation of the wake LiDAR measurements. Then, we learn the mapping between the parameter space and the (latent space) wake flow-fields using a deep neural network. Additionally, we also demonstrate the use of a probabilistic machine learning technique, namely, Gaussian process modeling, to learn the parameter-space-latent-space mapping in addition to the epistemic and aleatoric uncertainty in the data. Finally, to cope with training large datasets, we demonstrate the use of variational Gaussian process models that provide a tractable alternative to the conventional Gaussian process models for large datasets. Furthermore, we introduce the use of active learning to adaptively build and improve a conventional Gaussian process model predictive capability. Overall, we find that our approach provides accurate approximations of the wind-turbine wake flow field that can be queried at an orders-of-magnitude cheaper cost than those generated with high-fidelity physics-based simulations.Keywords Wind turbine • renewable energy • wake • deep learning • Gaussian processes • lidar