The optimal transport (OT) framework has been largely used in inverse imaging and computer vision problems, as an interesting way to incorporate statistical constraints or priors. In recent years, OT has also been used in machine learning, mostly as a metric to compare probability distributions. This work addresses the semi-discrete OT problem where a continuous source distribution is matched to a discrete target distribution. We introduce a fast stochastic algorithm to approximate such a semi-discrete OT problem using a hierarchical multi-layer transport plan. This method allows for tractable computation in highdimensional case and for large point-clouds, both during training and synthesis time. Experiments demonstrate its numerical advantage over multi-scale (or multi-level) methods. Applications to fast exemplar-based texture synthesis based on patch matching with two layers, also show stunning improvements over previous single layer approaches. This shallow model achieves comparable results with state-of-the-art deep learning methods, while being very compact, faster to train, and using a single image during training instead of a large dataset.