Receiver function analysis is widely used to make quantitative inferences about the structure below a seismic station. As these observables are mainly sensitive to traveltimes of phases converted and reflected at seismic discontinuities, the resulting inverse problem is highly non-linear, the solution non-unique, and there are strong trade-offs between the depth of discontinuities and absolute velocities. To overcome this difficulty, we propose to measure the misfit between the predicted and observed data with an optimal transport distance instead of the conventional least-squares distance, a strategy that has shown its assets in the context of full waveform inversion. This approach views a seismogram as a distribution of 'mass'. The optimal transport distance between two waveforms is the minimal cost of transporting one waveform onto the other. We test the optimal transport approach on the inversion of a radial P-wave receiver function. We also show how it can be applied to measure the cross-convolution distance between the radial and vertical components, thus avoiding the need for deconvolution associated with the calculation of the receiver function. The resulting misfit function is minimized with a local optimization algorithm to constrain the receiver-side structure. The benefits of this methodology are studied in simple synthetic tests and with real data. In particular, we show that with its increased sensibility to time-shifts, the optimal transport distance reduces the number of local minima in the misfit function, which, in the case of a linearized inversion, significantly reduces the dependency to the starting model and results in a better convergence towards the solution model. A joint inversion of the P-wave receiver function and surface wave dispersion curves is performed at the Hyderabad station in India.