Existing works about fashion outfit compatibility focus on predicting the overall compatibility of a set of fashion items with their information from different modalities. However, there are few works explore how to explain the prediction, which limits the persuasiveness and effectiveness of the model. In this work, we propose an approach to not only predict but also diagnose the outfit compatibility. We introduce an end-to-end framework for this goal, which features for: (1) The overall compatibility is learned from all type-specified pairwise similarities between items, and the backpropagation gradients are used to diagnose the incompatible factors.(2) We leverage the hierarchy of CNN and compare the features at different layers to take into account the compatibilities of different aspects from the low level (such as color, texture) to the high level (such as style). To support the proposed method, we build a new type-specified outfit dataset named Polyvore-T based on Polyvore dataset. We compare our method with the prior state-of-the-art in two tasks: outfit compatibility prediction and fillin-the-blank. Experiments show that our approach has advantages in both prediction performance and diagnosis ability.