Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport-specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use inputconvex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.Solving optimal transport (OT) with continuous methods has become widespread in machine learning, including methods for large-scale OT [11,36] and the popular Wasserstein Generative Adversarial Network (W-GAN) [3,12]. Rather than discretizing the problem [31], continuous OT algorithms use neural networks or kernel expansions to estimate transport maps or dual solutions. This helps scale OT to large-scale and higher-dimensional problems not handled by discrete methods. Notable successes of continuous OT are in generative modeling [42,20,19,7] and domain adaptation [43,37,25].In these applications, OT is typically incorporated as part of the loss terms for a neural network model. For example, in W-GANs, the OT cost is used as a loss function for the generator; the model incorporates a neural network-based OT solver to estimate the loss. Although recent W-GANs provide state-of-the-art generative performance, however, it remains unclear to which extent this success is connected to OT. For example, [28,32,38] show that popular solvers for the Wasserstein-1 (W 1 ) distance in GANs fail to estimate W 1 accurately. While W-GANs were initially introduced Preprint.