VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results

Zhu, Pengfei; Wen, Longyin; Du, Dawei; Bian, Xiao; Ling, Haibin; Hu, Qinghua; Nie, Qinqin; Cheng, Hao; Liu, Chenfeng; Liu, Xiaoyu; Ma, Wenya; Wu, Haotian; Wang, Lianjie; Schumann, Arne; Brown, Chase R.; Qian, Chen; Li, Chengzheng; Li, Dongdong; Michail, Emmanouil; Zhang, Fan; Feng, Nan; Zhu, Feng; Wang, Guanghui; Zhang, Haipeng; Deng, Han; Liu, Hao; Wang, Haoran; Qiu, Heqian; Qi, Honggang; Shi, Honghui; Li, Hongliang; Xu, Hongzhou; Hu, Lin; Kompatsiaris, Ioannis; Cheng, Jian; Wang, Jianqiang; Yang, Jianxiu; Zhou, Jingkai; Zhao, Jiewen; Joseph, K J; Duan, Kailiang; Suresh, K.S.; Ke, Bo; Wang, Ke; Avgerinakis, Konstantinos; Sommer, Lars; Zhang, Lei; Yang, Li; Cheng, Lin; Ma, Lin; Lu, Ling; Ding, Lu; Huang, Min-Yu; Vedurupaka, Naveen Kumar; Mamgain, Nehal; Bansal, Nitin; Acatay, Oliver; Giannakeris, Panagiotis; Wang, Qian; Zhao, Qijie; Huang, Qingming; Liu, Qiong; Cheng, Qishang; Sun, Qiuchen; Laganière, Robert; Jiang, Sheng; Wang, Shengjin; Wei, Shubo; Wang, Siwei; Vrochidis, Stefanos; Wang, Sujuan; Lee, Tiaojio; Sajid, Usman; Balasubramanian, Vineeth N; Li, Wei; Zhang, Wei; Wu, Weikun; Ma, Wenchi; He, Wenrui; Yang, Wenzhe; Chen, Xiaoyu; Sun, Xin; Luo, Xiaobing; Lian, Xintao; Li, Xiufang; Kuai, Yangliu; Li, Yali; Luo, Yi; Zhang, Yifan; Liu, Yiling; Li, Ying; Wang, Yong; Wang, Yongtao; Wu, Yuanwei; Fan, Yue; Wei, Yunchao; Zhang, Yuqin; Wang, Zexin; Wang, Zhangyang; Xia, Zhaoyue; Cui, Zhen; He, Zhenwei; Deng, Zhipeng; Guo, Zhiyao; Song, Zichen

doi:10.1007/978-3-030-11021-5_27

Cited by 58 publications

(47 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…According to the leaderboard [4] and workshop report [58], the best-performing single model is DE-FPN, which utilized FPN (removing P6) with a ResNeXt-101 64-4d backbone. We implement DE-FPN by identically following their method description in [58], as our comparison subject.…”

Section: Visdrone2018: Results and Analysismentioning

confidence: 99%

Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach

Suresh

Narayanan

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

Object detection from images captured by Unmanned Aerial Vehicles (UAVs) is becoming increasingly useful. Despite the great success of the generic object detection methods trained on ground-to-ground images, a huge performance drop is observed when they are directly applied to images captured by UAVs. The unsatisfactory performance is owing to many UAV-specific nuisances, such as varying flying altitudes, adverse weather conditions, dynamically changing viewing angles, etc. Those nuisances constitute a large number of fine-grained domains, across which the detection model has to stay robust. Fortunately, UAVs will record meta-data that depict those varying attributes, which are either freely available along with the UAV images, or can be easily obtained. We propose to utilize those free meta-data in conjunction with associated UAV images to learn domain-robust features via an adversarial training framework dubbed Nuisance Disentangled Feature Transform (NDFT), for the specific challenging problem of object detection in UAV images, achieving a substantial gain in robustness to those nuisances. We demonstrate the effectiveness of our proposed algorithm, by showing state-ofthe-art performance (single model) on two existing UAVbased object detection benchmarks. The code is available at https:// github.com/ TAMU-VITA/ UAV-NDFT.

show abstract

Section: Visdrone2018: Results and Analysismentioning

confidence: 99%

Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach

Suresh

Narayanan

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In our method, the image is cropped based on the clusters information, which is less likely to truncate numerous objects. The performance of detectors on UAVDT [8] is much lower than that on VisDrone [38], which is caused by the extremely unbalanced data.…”

Section: Quantitative Resultsmentioning

confidence: 94%

Clustered Object Detection in Aerial Images

Yang

Fan

Chu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

285

147

View full text Add to dashboard Cite

Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in pixels, making them hardly distinguished from surrounding background; and (2) targets are in general sparsely and non-uniformly distributed, making the detection very inefficient. In this paper, we address both issues inspired by observing that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object clustering and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces object cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region is fed into DetecNet for object detection. ClusDet has several advantages over previous solutions: (1) it greatly reduces the number of chips for final object detection and hence achieves high running time efficiency, (2) the clusterbased scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three popular aerial image datasets including VisDrone, UAVDT and DOTA. In all experiments, ClusDet achieves promising performance in comparison with state-of-the-art detectors. Code will be available in https://github.com/fyangneil. * Corresponding author. Aerial Image Cluster-wise EvenlyObject Coverage per Chip (ratio of objects in chip to whole image) Total Chips Sparse Common Clustered Figure 1: Comparison of grid-based uniform partition and the proposed cluster-based partition. For the narrative purpose, we intentionally classify a chip into three types: sparse, common, and clustered. We observe that, for gridbased uniform partition, more than 73% chips are sparse (including 23% chips with zero objects), around 25% chips are common, and about 2% chips are clustered. By contrast, for cluster-based partition, around 50% chips are sparse, 35% are common, and about 15% belong to clustered chips, which is 7× more than that of grid-based partition.images in MS COCO [22]) in recent years. Despite the promising results for general object detection, the performance of these detectors on the aerial images (e.g., 2,000×1,500 pixels in VisDrone [37]) are far from satisfactory in both accuracy and efficiency, which are caused by two challenges: (1) targets typically have small scales relative to the images; and (2) targets are generally sparsely and non-uniformly distributed in the whole image.

show abstract

“…The PASCAL VOC [25] dataset is one of the pioneering works in generic object detection, which is designed to provide a standardized testbed for object detection, image classification, object segmentation, person layout, and action classification [62]. The latest version is PASCAL VOC 2012.…”

Section: Pascal Vocmentioning

confidence: 99%

“…The images features a diverse real-world scenarios. The dataset was collected using various drone platforms (i.e., drones of different models), in different scenarios (across 14 different cities spanned over thousands of kilometres), and under various weather and lighting conditions [62]. This dataset is challenging since most of the objects are small and densely populated as shown in Figure 1.6.…”

Section: Visdrone-det2018mentioning

confidence: 99%