Collective user behavior in social media applications often drives several important online and offline phenomena linked to the spread of opinions and information. Several studies have focused on the analysis of such phenomena using networks to model user interactions, represented by edges. However, only a fraction of edges contribute to the actual investigation. Even worse, the often large number of non-relevant edges may obfuscate the salient interactions, blurring the underlying structures and user communities that capture the collective behavior patterns driving the target phenomenon. To solve this issue, researchers have proposed several network backbone extraction techniques to obtain a reduced and representative version of the network that better explains the phenomenon of interest. Each technique has its specific assumptions and procedure to extract the backbone. However, the literature lacks a clear methodology to highlight such assumptions, discuss how they affect the choice of a method and offer validation strategies in scenarios where no ground truth exists. In this work, we fill this gap by proposing a principled methodology for comparing and selecting the most appropriate backbone extraction method given a phenomenon of interest. We characterize ten state-of-the-art techniques in terms of their assumptions, requirements, and other aspects that one must consider to apply them in practice. We present four steps to apply, evaluate and select the best method(s) to a given target phenomenon. We validate our approach using two case studies with different requirements: online discussions on Instagram and coordinated behavior in WhatsApp groups. We show that each method can produce very different backbones, underlying that the choice of an adequate method is of utmost importance to reveal valuable knowledge about the particular phenomenon under investigation.