Deep learning has achieved good results in the crack detection of roads and bridges. However, the timber structures of ancient architecture have strong orthotropic anisotropy and complex microscopic structures, and the law of cracks development is extremely complex. The image data has a large proportion of pixels, which is obviously different from the background gray value, and there is timber grain noise, thus the existing methods cannot accurately extract the complex texture contour feature of cracks. In previous studies, we have verified that YOLO v5s is effective in crack detection in timber structures of ancient architecture. However, there are many different versions of YOLO series models. In order to find a better algorithm, this paper mainly adopts three models including YOLO v3, YOLO v4s-mish, and YOLO v5s to detect cracks in the timber structures of ancient architecture, and compares and analyzes the advantages and disadvantages of the three models. In the comparing process, we mainly have discussed the index performance of the three models in terms of training time, loss function, recall rate, and mAP value. We have summarized and analyzed the advantages and disadvantages of the three models in cracks detection of the timber structures of ancient architecture, and concluded the comparing results of the three models in cracks detection based on experiments. We published the first picture data set of cracks in timber structures of ancient architecture, and applied YOLO model in the intelligent identification field of cracks in timber structures of ancient architecture for the first time, which opened up a new idea for the intelligent operation and maintenance of the timber structures of ancient architecture.