Aiming at accelerating the inter coding of versatile video coding (VVC), the existing deep-learning-based methods utilize a single convolutional neural network (CNN) to directly predict the quadtree plus multi-type tree (QTMT)-based partition of the whole coding tree unit (CTU). However, these methods adopt one prediction network for unevenly distributed CTUs and ignore that the different CTUs have different partition prediction difficulties, leading to performance degradation and computation waste. To overcome these limitations, a classificationprediction joint framework is proposed to accelerate inter coding of VVC in this letter, which combines classification and prediction to process different CTUs through different networks with appropriate capacities. To achieve effective partition prediction of the whole CTU, the QTMT-based partition is first modeled as the partition homogeneity map (PHM), which is a value map reflecting the partition of each 8×8 unit. Second, the classification module classifies the CTUs into different classes according to their partition prediction difficulty, and then different prediction sub-networks with appropriate capacities are utilized to predict the PHM for the corresponding CTU class. Finally, the decision tree (DT) is adopted to determine the optimal split modes based on the predicted PHM. Experimental results show that the approach achieves 44.5% time saving with 1.94% BD-BR increase, outperforming stateof-the-art approaches.