Network embedding techniques, which provide low dimensional representations of the nodes in a network, have been commonly applied to many machine learning problems in computational biology. In most of these applications, multiple networks (e.g., different types of interactions/associations or semantically identical networks that come from different sources) are available. Multiplex network embedding aims to derive strength from these data sources by integrating multiple networks with a common set of nodes. Existing approaches to this problem treat all layers of the multiplex network equally while performing integration, ignoring the differences in the topology and sparsity patterns of different networks. Here, we formulate an optimization problem that accounts for inner-network smoothness, intra-network smoothness, and topological similarity of networks to compute diffusion states for each network. To quantify the topological similarity of pairs of networks, we use Gromov-Wasserteins discrepancy. Finally, we integrate the resulting diffusion states and apply dimensionality reduction (singular value decomposition after log-transformation) to compute node embeddings. Our experimental results in the context of drug repositioning and drug-target prediction show that the embeddings computed by the resulting algorithm, Hattusha, consistently improve predictive accuracy over algorithms that do not take into account the topological similarity of different networks.