2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00813
|View full text |Cite
|
Sign up to set email alerts
|

MOTS: Multi-Object Tracking and Segmentation

Abstract: This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend existing multi-object tracking metrics to this new task. Moreover, we propose a new baseline method which jointly addre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
536
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 557 publications
(537 citation statements)
references
References 59 publications
0
536
0
1
Order By: Relevance
“…Accurate detection is still essential to cell tracking performance, and deep learning-based segmentation methods still make impactful errors as cells become more crowded. We expect this to be mitigated as more expansive sets of data are annotated, and as segmentation methods that use spatiotemporal information to inform segmentation decisions come online 42 . This method, and all supervised machine learning methods, is limited by the training data that empower it.…”
Section: Discussionmentioning
confidence: 99%
“…Accurate detection is still essential to cell tracking performance, and deep learning-based segmentation methods still make impactful errors as cells become more crowded. We expect this to be mitigated as more expansive sets of data are annotated, and as segmentation methods that use spatiotemporal information to inform segmentation decisions come online 42 . This method, and all supervised machine learning methods, is limited by the training data that empower it.…”
Section: Discussionmentioning
confidence: 99%
“…We use the KITTI tracking benchmark to evaluate our tracking results for both cars and pedestrians in realworld driving scenes. The annotations for this have been extended with pixel-level mask annotations to evaluate the MOTS task (multi-object tracking and segmentation) [38]. We use the official KITTI test server as well as the validation split from [38] for evaluation.…”
Section: Methodsmentioning
confidence: 99%
“…Segmentation Tracking. Recently [38], [28], [20] segmentation masks have been exploited for improving tracking and creating pixel-accurate tracking results. These often use optical flow [16] or scene flow [41] to model the motion of each pixel.…”
Section: Related Workmentioning
confidence: 99%
“…In this context, several existing methods can be used, both LiDAR [11,42] or RGB based [44]. As we mainly target (automotive) real-time tracking scenarios, we constrain the pose transform prediction to translation on the ground plane (equivalent to the xy-plane in all experiments) and relative rotation around the z-axis.…”
Section: Methodsmentioning
confidence: 99%
“…The sequences are divided into training and validation sequences, as in [44]. Table 3: Registration results on Synth20, containing simulated scans of meshes of 20 different classes of ModelNet40 [46].…”
Section: Datasetsmentioning
confidence: 99%