Information on social media spreads through an underlying diffusion network that connects people of common interests and opinions. This diffusion network often comprises multiple layers, each capturing the spreading dynamics of a certain type of information characterized by, for example, topic, attitude, or language. Researchers have previously proposed methods to infer these underlying multilayer diffusion networks from observed spreading patterns, but little is known about how well these methods perform across the range of realistic spreading data. In this paper, we first introduce an effective implementation of the inference method that can achieve higher accuracy than existing implementations in comparable runtime. Then, we conduct an extensive series of synthetic data experiments to systematically analyze the performance of the method, under varied network structure (e.g. density, number of layers) and information diffusion settings (e.g. cascade size, layer mixing) that are designed to mimic real-world spreading on social media. Our findings include that the inference accuracy varies extremely with network density, and that the method fails to decompose the diffusion network correctly when most cascades in the data reach a limited audience. In demonstrating the conditions under which the inference accuracy is extremely low, our paper highlights the need to carefully evaluate the applicability of the method before running the inference on real data. Practically, our results serve as a reference for this evaluation, and our publicly available implementation supports further testing under personalized settings.