Automated, reliable, and objective microstructure inference from micrographs is an essential milestone towards a comprehensive understanding of process-microstructure-property relations and tailored materials development. However, such inference, with the increasing complexity of microstructures, requires advanced segmentation methodologies. While deep learning (DL), in principle, offers new opportunities for this task, an intuition about the required data quality and quantity and an extensive methodological DL guideline for microstructure quantification and classification are still missing. This, along with a lack of open-access data sets and the seemingly intransparent decision-making process of DL models, hampers its breakthrough in this field. We address all aforementioned obstacles by a multidisciplinary DL approach, devoting equal attention to specimen preparation, contrasting, and imaging. To this end, we train distinct U-Net architectures with 30–50 micrographs of different imaging modalities and corresponding EBSD-informed annotations. On the challenging task of lath-bainite segmentation in complex-phase steel, we achieve accuracies of 90% rivaling expert segmentations. Further, we discuss the impact of image context, pre-training with domain-extrinsic data, and data augmentation. Network visualization techniques demonstrate plausible model decisions based on grain boundary morphology and triple points. As a result, we resolve preconceptions about required data amounts and interpretability to pave the way for DL's day-to-day application for microstructure quantification.