With the rapid development of artificial intelligence, computer vision techniques have been successfully applied to concrete defect diagnosis in bridge structural health monitoring. To enhance the accuracy of identifying the location and type of concrete defects (cracks, exposed bars, spalling, efflorescence and voids), this paper proposes improvements to the existing Mask Region Convolution Neural Network (Mask R-CNN). The improvements are as follows: (i) The residual network (ResNet101), the backbone network of Mask R-CNN which has too many convolution layers, is replaced by the lightweight network MobileNetV2. This can solve the problem that the large number of parameters leads to a slow training speed of the model, and improve the ability to extract features of smaller targets. (ii) Embedding attention mechanism modules in Feature Pyramid Networks (FPNs) to better extract the target features. (iii) A path aggregation network (PANet) is added to solve the problem that the model Mask R-CNN lacks the ability to extract shallow layer feature information. To validate the superiority of the proposed improved Mask R-CNN, the multi-class concrete defect image dataset was constructed, and using the K-means clustering algorithm to determine the aspect ratio of the most suitable prior bounding box for the dataset. Following, the identification results of improved Mask-RCNN, original Mask-RCNN and other mainstream deep learning networks on five types of concrete defects (cracks, exposed bars, spalling, efflorescence and voids) in the dataset were compared. Finally, the intelligent identification system for concrete defects has been established by innovatively combining images taken by unmanned aerial vehicles (UAVs) with our improved defect identification model. The reinforced concrete bridge defects images collected by UAVs were used as test set for testing. The result is the improved Mask R-CNN with superior accuracy, and the identification accuracy is higher than the original Mask-RCNN and other deep learning networks. The improved Mask-RCNN can identify the new untrained concrete defects images taken by UAVs, and the identification accuracy can meet the requirements of bridge structural health monitoring.