In the context of Industry 4.0, the most popular way to identify and track objects is to add tags, and currently most companies still use cheap quick response (QR) tags, which can be positioned by computer vision (CV) technology. In CV, instance segmentation (IS) can detect the position of tags while also segmenting each instance. Currently, the mask region-based convolutional neural network (Mask R-CNN) method is used to realize IS, but the completeness of the instance mask cannot be guaranteed. Furthermore, due to the rich texture of QR tags, low-quality images can lower intersection-over-union (IoU) significantly, disabling it from accurately measuring the completeness of the instance mask. In order to optimize the IoU of the instance mask, a QR tag IS method named the mask UNet region-based convolutional neural network (MU R-CNN) is proposed. We utilize the UNet branch to reduce the impact of low image quality on IoU through texture segmentation. The UNet branch does not depend on the features of the Mask R-CNN branch so its training process can be carried out independently. The pre-trained optimal UNet model can ensure that the loss of MU R-CNN is accurate from the beginning of the end-to-end training. Experimental results show that the proposed MU R-CNN is applicable to both high-and low-quality images, and thus more suitable for Industry 4.0.