Crack identification is essential for the preventive maintenance of asphalt pavement. Through periodic inspection, the characteristics of existing and emerging cracks can be obtained, and the pavement condition index can be calculated to assess pavement health. The most common method to estimate the area of cracks in an image is to count the number of grid cells or boxes that cover the cracks in an image. Accurate and efficient crack identification is the premise of crack assessment. However, the current patch‐based classification method is limited by the receptive field and cannot be used to directly classify cracks. Furthermore, the postprocessing algorithm in anchor‐based detection is not explicitly optimized for crack topology. In this paper, a new model, which is the fusion of grid‐based classification and box‐based detection based on You Only Look Once version 5 (YOLO v5) is developed and described for the first time. The accuracy and efficiency of the model are high partly due to the implementation of a shared backbone network, multi‐task learning, and joint training. The non‐maximum suppression (NMS)–area‐reduction suppression (ARS) algorithm is presented to filter redundant, overlapping prediction boxes in the postprocessing stage for the crack topology, and the mapping and matching algorithm is proposed to combine the advantages of both the grid‐based and box‐based models. A double‐labeled dataset containing tens of thousands of asphalt pavement images is assembled, and the proposed method is verified on the test set. The fusion model has superior performance over the individual classification and detection models, and the proposed NMS‐ARS algorithm further improves the detection accuracy. Experimental results demonstrate that the presented method effectively realizes automatic crack identification for asphalt pavement.