Full‐waveform inversion (FWI) is today a standard process for the inverse problem of seismic imaging. PDE‐constrained optimization is used to determine unknown parameters in a wave equation that represent geophysical properties. The objective function measures the misfit between the observed data and the calculated synthetic data, and it has traditionally been the least‐squares norm. In a sequence of papers, we introduced the Wasserstein metric from optimal transport as an alternative misfit function for mitigating the so‐called cycle skipping, which is the trapping of the optimization process in local minima. In this paper, we first give a sharper theorem regarding the convexity of the Wasserstein metric as the objective function. We then focus on two new issues. One is the necessary normalization of turning seismic signals into probability measures such that the theory of optimal transport applies. The other, which is beyond cycle skipping, is the inversion for parameters below reflecting interfaces. For the first, we propose a class of normalizations and prove several favorable properties for this class. For the latter, we demonstrate that FWI using optimal transport can recover geophysical properties from domains where no seismic waves travel through. We finally illustrate these properties by the realistic application of imaging salt inclusions, which has been a significant challenge in exploration geophysics. © 2021 Wiley Periodicals LLC.