Augmentation for small object detection

Kisantal, Máté; Wojna, Zbigniew; Murawski, Jakub; Naruniec, Jacek; Cho, Kyunghyun

doi:10.5121/csit.2019.91713

Cited by 472 publications

(127 citation statements)

References 30 publications

Supporting

Mentioning

104

Contrasting

Order By: Relevance

“…Also, in the case of fog, snow, and rain, the image quality of EO is blurred and not easily identified due to being limited by camera parameters [7]. Fortunately, a few methods to reduce label noise [41], [42], a small object detection method [43], increasing the brightness of night image [44], dehazing [45], rain and snow removal [46] are proposed, which have strengthened the application of EO camera for image processing and computer vision. So, the new ship classifier will be integrated with not only the image captured from the EO camera but also the information of multi-sensor data fusion to improve image classification and recognition.…”

Section: Conclusion and Discussionmentioning

confidence: 99%

Maritime Visible Image Classification Based on Double Transfer Method

et al. 2020

View full text Add to dashboard Cite

Image classification using deep transfer learning has received significant attention, benefiting from pre-trained with the large-scale annotation dataset and continuous improvement of neural network structure. In contrast to universal image classification, however, few publicly available datasets of maritime environments utilize deep transfer learning. Due to data-gathering effort and computational cost, the maritime datasets are deficient in the method of merging datasets and the benchmark of few-shot dataset classifiers. In this paper, we proposed the double transfer method, consisting of the merging datasets network and the backbone network, to address the problem. The merging datasets network measuring image similarity separates classes of known and unknown samples to reorganize a dataset, and the backbone network is constructed from the model EfficientNet-b5 by network-based deep transfer learning. Using the merging datasets network, we introduce the visible maritime image dataset, which has 3,750 images and twenty-five classes, including multitudinous maritime objects. The backbone networks evaluated and analyzed the dataset based on accuracy, precision, recall, and F-measure metrics. Using the double transfer method, we can achieve an accuracy of 91.39% in the visible maritime image dataset.

show abstract

Section: Conclusion and Discussionmentioning

confidence: 99%

Maritime Visible Image Classification Based on Double Transfer Method

et al. 2020

View full text Add to dashboard Cite

show abstract

“…We removed the infrared channel and used only the RGB channels to train the model, in order to use the pretrained model verified on the existing PASCAL VOC [55] and MS COCO datasets [62]. We adopt the same data augmentation scheme in this paper [63] which is intended for small object detection. In addition, we apply several strategies, including "strict in" and "strict out," so that Mask-RCNN focuses on small object detection.…”

Section: Methodsmentioning

confidence: 99%

Mask-R-FCN: A Deep Fusion Network for Semantic Segmentation

Zhang

Chi

2020

IEEE Access

View full text Add to dashboard Cite

Remote sensing image classification plays a significant role in urban applications, precision agriculture, water resource management. The task of classification in the field of remote sensing is to map raw images to semantic maps. Typically, fully convolutional network (FCN) is one of the most effective deep neural networks for semantic segmentation. However, small objects in remote sensing images can be easily overlooked and misclassified as the majority label, which is often the background of the image. Although many works have attempted to deal with this problem, making a trade-off between background semantics and edge details is still a problem. This is mainly because they are based on a single neural network model. To deal with this problem, a convolutional deep network with regions (R-CNN), which is highly effective for object detection is leveraged as a complementary component in our work. A learningbased and decision-level strategy is applied to fuse both semantic maps from a semantic model and an object detection model. The proposed network is referred to as Mask-R-FCN. Experimental results on real remote sensing images from the Zurich dataset, Gaofen Image Dataset (GID), and DataFountain2017 show that the proposed network can obtain higher accuracy than single deep neural networks and other machine learning algorithms. The proposed network achieved better average accuracies, which are approximately 2% higher than those of any other single deep neural networks on the Zurich, GID, and DataFoundation2017 datasets.

show abstract

“…At the site, it's difficult to acquire high-quality raw images, which bring some troubles to model training. Inspired by the works of Kisantal et al [24], we adopt an improved data augmentation method for the small object detection, namely SWDA. The procedure is described in algorithm1.…”

Section: Principle Of Methods a Segmentation Framework Of Steel mentioning

confidence: 99%

End-Face Localization and Segmentation of Steel Bar Based on Convolution Neural Network

et al. 2020

View full text Add to dashboard Cite

Both number manually-counting method and traditional Machine-Vision (MV) number counting strategy are laborious and very time-consuming (sometimes several hours). Thus a new deep learning (DL) fusion model is proposed, which includes object detection and semantic segmentation. It can solve the problems of end-face localization and segmentation of steel bars at the same time. In this fusion model, firstly, an improved data augmentation method namely, Sliding Window Data Augmentation (SWDA) is adopted to compensate less training data concerning object detection, based on which a new object-detection architecture, Inception-RFB-FPN is presented to improve the accuracy and inference time. Secondly, a novel AI labeling method, Fibonacci-incremental mask labeling method (FIMLM) is introduced to accelerate the generation of annotation mask. Furthermore, by contrast, three FCN (Fully Convolutional Networks) architectures of data segmentation, namely, VGG16-FCN, ResNet18-FCN, and ResNet34-FCN are used to conduct the end-face segmentations of steel bars separately. Finally, a series of experiments show that the proposed Inception-RFB-FPN model can reach 98.17% in F1 score (harmonic mean value of precision and recall) with respect to object detection, and its inference time only needs 0.0306 seconds, much faster than some related reports. In addition, the FIMLM-based ResNet34-FCN model can reach 97.47% in mean Intersection-Over-Union (mIOU) with respect to semantic segmentation, higher than both VGG16-FCN and ResNet18-FCN.

show abstract

Augmentation for small object detection

Cited by 472 publications

References 30 publications

Maritime Visible Image Classification Based on Double Transfer Method

Maritime Visible Image Classification Based on Double Transfer Method

Mask-R-FCN: A Deep Fusion Network for Semantic Segmentation

End-Face Localization and Segmentation of Steel Bar Based on Convolution Neural Network

Contact Info

Product

Resources

About