Pavement distress detection is a crucial task when assessing pavement performance conditions. Here, a novel deep-learning method based on a transformer network, referred to as ISTD-DisNet, is proposed for multi-type pavement distress semantic segmentation. In this methodology, a mix transformer (MiT) based on a hierarchical transformer structure is chosen as the backbone to obtain multi-scale feature information on pavement distress, and a mixed attention module (MAM) is introduced at the decoding stage to capture the pavement distress features across different channels and spatial locations. A learnable transposed convolution upsampling module (TCUM) enhances the model’s ability to restore multi-scale distress details. Subsequently, a novel parameter—the distress pixel density ratio (PDR)—is introduced based on the segmentation results. Analyzing the intrinsic correlation between the PDR and the pavement condition index (PCI), a new pavement damage index prediction model is proposed. Finally, the experimental results reveal that the F1 and mIOU of the proposed method are 95.51% and 91.67%, respectively, and the segmentation performance is better than that of the other seven mainstream segmentation models. Further PCI prediction model validation experimental results also indicate that utilizing the PDR enables the quantitative evaluation of the pavement damage conditions for each assessment unit, holding promising engineering application potential.