Computer vision has become a fundamental area of interest in recent decades. Each area has unique data which object detection methods can analyse. However, it is important to find the most suitable parameters for the model that detects different object groups. In this research has been investigated the influence of pre-trained YOLOv5 (nano (n), small (s), medium (m), large (l), extralarge (x)) models, hyperparameters (learning rate, momentum, and weight decay) and different image augmentation (hsv_h, degrees, translate, flipud, mosaic, mixup, shear, perspective) efficiency for similar construction details detection. A newly collected dataset with twenty-two labelled categories of construction details was prepared. A total of 270 models were trained and evaluated. Every model was evaluated with 3,300 test images which backgrounds were mixed, neutral, and white backgrounds. The most accurate model was YOLOv5l with learning rate – 0.001, momentum – 0.950 and weight decay – 0.0001. This model achieved – 0.5015 (50.15%) accuracy.