In the last ten years, full-waveform inversion (FWI) has emerged as a robust and efficient high-resolution subsurface model-building tool for seismic imaging, with the unique ability to invert for complex models. FWI is based on the minimization of a cost function between observed and modelled data, the data space consisting in collections of time-series. Originally considering a least-squares (LSQ) cost function, the method suffered from high sensitivity to local minima and was therefore of poor efficiency in handling large time shifts between observed and modelled data events. To tackle this problem, a common practice is to start the inversion using the low temporal frequencies of the data and selecting specific data events called diving waves. Complementary to this, the use of other cost functions has been investigated. Among these, cost functions based on optimal transport (OT) appeared appealing to possibly handle large time shifts between observed and modelled data events. Several strategies inspired by OT have been proposed, taking into account the specificities of seismic data. Among them, the approach based on the Kantorovich–Rubinstein (KR) norm offers the possibility of the direct use of seismic data and an efficient numerical implementation allowing for a multidimensional (data coordinate space) application. We present here an analysis of the KR norm, discussing its theoretical and practical aspects. A key component of our analysis is the adjoint-source or data-space gradient of the cost function (converted into the model-space gradient within FWI). We highlight its piecewise linearity, analyze its frequency content and amplitude, and emphasize the benefit of having a multidimensional implementation. We give practical rules for setting the tuning parameters. Our set of synthetic and field data examples demonstrate the improvements brought by the use of the KR norm over LSQ FWI, and highlight the improvements brought by the multidimensional approach over the one-dimensional one.