Multi-class Multi-object Tracking Using Changing Point Detection

Lee, Byungjae; Erdenee, Enkhbayar; Jin, Shiwei; Nam, Mi Young; Jung, Young Giu; Rhee, Phill Kyu

doi:10.1007/978-3-319-48881-3_6

Cited by 123 publications

(78 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that the baseline accuracy is higher than that reported in [8] mainly because of a better ImageNet pretrained model and the introduction of RoIAlign in object detection. For object detection on Ima-geNet VID, we mainly follow the protocol in [27,43,42] for the training and inference settings. The details are presented at the end of this section.…”

Section: Fine-tuning For Specific Tasksmentioning

confidence: 99%

Deformable ConvNets V2: More Deformable, Better Results

Zhu

Lin

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

1,922

987

View full text Add to dashboard Cite

The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation. * This work is done when Xizhou Zhu is an intern at Microsoft Research Asia.

show abstract

Section: Fine-tuning For Specific Tasksmentioning

confidence: 99%

Deformable ConvNets V2: More Deformable, Better Results

Zhu

Lin

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

1,922

987

View full text Add to dashboard Cite

show abstract

“…First, an algorithm detects objects of interests and second, identical objects in different frames are associated. A widespread approach is using global information about the detections [17,7]. In contrast to this, online approaches don't have any knowledge of future frames.…”

Section: Multi Target Trackingmentioning

confidence: 99%

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Simón

Amende

Kraus

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

240

View full text Add to dashboard Cite

Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.

show abstract

“…Experiments are performed on ImageNet VID [47], a large-scale benchmark for video object detection. Following the practice in [48,49], model training and evaluation are performed on the 3,862 training video snippets and the 555 validation video snippets, respectively. The snippets are at frame rates of 25 or 30 fps in general.…”

Section: Methodsmentioning

confidence: 99%

“…In training, following [48,49], both the ImageNet VID training set and the ImageNet DET training set are utilized. In each mini-batch of SGD, either n + 1 nearby video frames from ImageNet VID, or a single image from ImageNet DET, are sampled at 1:1 ratio.…”

Section: Methodsmentioning

confidence: 99%

Towards High Performance Video Object Detection

Zhu

Dai

Liu

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

280

188

View full text Add to dashboard Cite

Despite the recent success of video object detection on Desktop GPUs, its architecture is still far too heavy for mobiles. It is also unclear whether the key principles of sparse feature propagation and multi-frame feature aggregation apply at very limited computational resources. In this paper, we present a light weight network architecture for video object detection on mobiles. Light weight image object detector is applied on sparse key frames. A very small network, Light Flow, is designed for establishing correspondence across frames. A flow-guided GRU module is designed to effectively aggregate features on key frames. For non-key frames, sparse feature propagation is performed. The whole network can be trained end-to-end. The proposed system achieves 60.2% mAP score on ImageNet VID validation at speed of 25.6 fps on mobiles (e.g., HuaWei Mate 8).

show abstract

Multi-class Multi-object Tracking Using Changing Point Detection

Cited by 123 publications

References 45 publications

Deformable ConvNets V2: More Deformable, Better Results

Deformable ConvNets V2: More Deformable, Better Results

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Towards High Performance Video Object Detection

Contact Info

Product

Resources

About