Object Detection in Aerial Images Using Feature Fusion Deep Networks

Lian, Hao; Chung, Yi-Nung; Liu, Zhenbao; Bu, Shuhui

doi:10.1109/access.2019.2903422

Cited by 30 publications

(15 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Feature-Fusion mode has a quite close AP (90.5%) to that of the Image-Fusion mode, although double convolutional layers were used to extract kiwifruit image features. There have been some studies working on the relationship between extracted features and detection results [38], [39]. Based on those researches, a hypothesis that the Feature-Fusion mode has two VGG16 networks to learn the features from the RGB and NIR images respectively, which results in a duplicate features learning of some important features belonging to both RGB and NIR images, such as fruit shape and calyx shape.…”

Section: A Evaluation Of the Four Different Modesmentioning

confidence: 99%

Improved Kiwifruit Detection Using Pre-Trained VGG16 With RGB and NIR Information Fusion

Liu

et al. 2020

IEEE Access

140

View full text Add to dashboard Cite

This study presents a novel method to apply the RGB-D (Red Green Blue-Depth) sensors and fuse aligned RGB and NIR images with deep convolutional neural networks (CNN) for fruit detection. It aims to build a more accurate, faster, and more reliable fruit detection system, which is a vital element for fruit yield estimation and automated harvesting. Recent work in deep neural networks has led to the development of a state-of-the-art object detector termed Faster Region-based CNN (Faster R-CNN). A common Faster R-CNN network VGG16 was adopted through transfer learning, for the task of kiwifruit detection using imagery obtained from two modalities: RGB (red, green, blue) and Near-Infrared (NIR) images. Kinect v2 was used to take a bottom view of the kiwifruit canopy's NIR and RGB images. The NIR (1 channel) and RGB images (3 channels) were aligned and arranged side by side into a 6-channel image. The input layer of the VGG16 was modified to receive the 6-channel image. Two different fusion methods were used to extract features: Image-Fusion (fusion of the RGB and NIR images on input layer) and Feature-Fusion (fusion of feature maps of two VGG16 networks where the RGB and NIR images were input respectively). The improved networks were trained end-to-end using back-propagation and stochastic gradient descent techniques and compared to original VGG16 networks with RGB and NIR image input only. Results showed that the average precision (APs) of the original VGG16 with RGB and NIR image input only were 88.4% and 89.2% respectively, the 6-channel VGG16 using the Feature-Fusion method reached 90.5%, while that using the Image-Fusion method reached the highest AP of 90.7% and the fastest detection speed of 0.134 s/image. The results indicated that the proposed kiwifruit detection approach shows a potential for better fruit detection.

show abstract

Section: A Evaluation Of the Four Different Modesmentioning

confidence: 99%

Improved Kiwifruit Detection Using Pre-Trained VGG16 With RGB and NIR Information Fusion

Liu

et al. 2020

IEEE Access

140

View full text Add to dashboard Cite

show abstract

“…Using residual learning to optimize the convolutional layer, and the model to fuse the features of different layers, the method achieved better results than other classifiers. To solve the complex situation such as the small size of the object in the aerial image, a feature fusion deep network was proposed [27], and the spatial relationship between objects was increased by the fusion of network layer features, thus accurate detection results were obtained. Facial expressions are very important information in human behavior research.…”

Section: Related Workmentioning

confidence: 99%

Lightweight Attention Pyramid Network for Object Detection and Instance Segmentation

et al. 2020

View full text Add to dashboard Cite

Feature pyramids of convolutional neural networks (ConvNets)—from bottom to top—are used by most recent researchers for the improvement of object detection accuracy, but they seldom aim to address the correlation of each feature channel and the fusion of low-level features and high-level features. In this paper, an Attention Pyramid Network (APN) is proposed, which mainly contains the adaptive transformation module and feature attention block. The adaptive transformation module utilizes the multiscale feature fusion, and makes full use of the accurate target location information of low-level features and the semantic information of high-level features. Then, the feature attention block strengthens the features of important channels and weakens the features of unimportant channels through learning. By implementing the APN in a basic Mask R-CNN system, our method achieves state-of-the-art results on the MS COCO dataset and 2018 WAD database without bells and whistles. In addition, the structure of the APN makes the network parameters lighter, and runs at 4 ms on average, which is ignorable when compared to the inference time of the backbone of ConvNet.

show abstract

“…However, detecting vehicles in aerial images both accurately and quickly is challenging. As aerial images are taken from altitude with a top-down view, vehicles appear relatively small, and a single image may contain many vehicles [10]. Moreover, other objects, shadows, and various patterns, such as road markings, can appear similar to vehicles [3].…”

Section: Introductionmentioning

confidence: 99%

“…Specifically, they perform less satisfactorily in the localization of small objects in a large scene [21]. In addition, training these networks generally demands a high computational cost, and the lack of wellannotated training data adds to the challenge [10], [22].…”

Section: Introductionmentioning

confidence: 99%

Vehicle Detection in Aerial Images Based on 3D Depth Maps and Deep Neural Networks

2021

View full text Add to dashboard Cite

Object Detection in Aerial Images Using Feature Fusion Deep Networks

Cited by 30 publications

References 49 publications

Improved Kiwifruit Detection Using Pre-Trained VGG16 With RGB and NIR Information Fusion

Improved Kiwifruit Detection Using Pre-Trained VGG16 With RGB and NIR Information Fusion

Lightweight Attention Pyramid Network for Object Detection and Instance Segmentation

Vehicle Detection in Aerial Images Based on 3D Depth Maps and Deep Neural Networks

Contact Info

Product

Resources

About