In this article we propose an effective algorithm for small object detection in high resolution images. We look at the image at different scales and use block processing by convolutional neural network. Pyramid layers number is defined by input image resolution and convolutional layer size. On each layer of pyramid except the highest we perform splitting overlapping blocks to improve small object detection accuracy. Detected areas are merged into one if they belong to the same class and have high overlapping value. In the paper experimental results using YOLOv4 for 4K and 8K images are presented. Our algorithm shows better detecting small objects results in high-definition video than YOLOv4.
The paper is aimed to improve person re-identification accuracy in distributed video surveillance systems based on constructing a large joint image dataset of people for training convolutional neural networks (CNN). For this aim, an analysis of existing datasets is provided. Then, a new large joint dataset for person re-identification task is constructed that includes the existing public datasets CUHK02, CUHK03, Market, Duke, MSMT17 and PolReID. Testing for re-identification is performed for such frequently cited CNNs as ResNet-50, DenseNet121 and PCB. Re-identification accuracy is evaluated by using the main metrics Rank, mAP and mINP. The use of the new large joint dataset makes it possible to improve Rank1 mAP, mINP on all test sets. Re-ranking is used to further increase the re-identification accuracy. Presented results confirm the effectiveness of the proposed approach.
Objectives. The main goal is to improve person re-identification accuracy in distributed video surveillance systems.Methods. Machine learning methods are applied.Result. A technology for two-stage training of convolutional neural networks (CNN) is presented, characterized by the use of image augmentation for the preliminary stage and fine tuning of weight coefficients based on the original images set for training. At the first stage, training is carried out on augmented data, at the second stage, fine tuning of the CNN is performed on the original images, which allows minimizing the losses and increasing model efficiency. The use of different data at different training stages does not allow the CNN to remember training examples, thereby preventing overfitting.Proposed method as expanding the training sample differs as it combines an image pixels cyclic shift, color exclusion and fragment replacement with a reduced copy of another image. This augmentation method allows to get a wide variety of training data, which increases the CNN robustness to occlusions, illumination, low image resolution, dependence on the location of features.Conclusion. The use of two-stage learning technology and the proposed data augmentation method made it possible to increase the person re-identification accuracy for different CNNs and datasets: in the Rank1 metric by 4–21 %; in the mAP by 10–31 %; in the mINP by 39–60 %.
To improve the person re-identification system accuracy, an integrated approach is proposed in the formation of a training sample for convolutional neural networks, which involves the use of a new image dataset, an increase in the training examples number using existing datasets, and the use of a number of transformations to increase their diversity. The created dataset PolReID1077 contains images of people that were obtained in all seasons, which will improve the correct operation of re-identification systems when the seasons change. Another PolReID1077 advantage is the video data use obtained from external and internal surveillance in a large number of different filming locations. Therefore, the people images in the created set are characterized by the variability of the background, brightness and color characteristics. Joining the created dataset with the existing CUHK02, CUHK03, Market-1501, DukeMTMC-ReID and MSMT17 sets made it possible to obtain 109 772 images for training. An increase in the variety of generated examples is achieved by applying a cyclic shift to them, eliminating color and replacing a fragment with a reduced copy of another image. The research results on estimating the accuracy of re-identification for the ResNet-50 and DenseNet-121 convolutional neural networks during their training, using the proposed approach to form a training sample, are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.