GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization

Papakis, Ioannis; Sarkar, Abhijit; Karpatne, Anuj

doi:10.48550/arxiv.2010.00067

Cited by 14 publications

(31 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“….545 ( 65) 3010 (71) .618 (33) .659 (22) .698 (50) .555 (49) .411 (32) .348 (30) .498 (18) UNS20regress…”

Section: A Extended Results: Mot17mentioning

confidence: 99%

See 1 more Smart Citation

Local Metrics for Multi-Object Tracking

Valmadre,

Bewley,

Huang

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper introduces temporally local metrics for Multi-Object Tracking. These metrics are obtained by restricting existing metrics based on track matching to a finite temporal horizon, and provide new insight into the ability of trackers to maintain identity over time. Moreover, the horizon parameter offers a novel, meaningful mechanism by which to define the relative importance of detection and association, a common dilemma in applications where imperfect association is tolerable. It is shown that the historical Average Tracking Accuracy (ATA) metric exhibits superior sensitivity to association, enabling its proposed local variant, ALTA, to capture a wide range of characteristics. In particular, ALTA is better equipped to identify advances in association independent of detection. The paper further presents an error decomposition for ATA that reveals the impact of four distinct error types and is equally applicable to ALTA. The diagnostic capabilities of ALTA are demonstrated on the MOT 2017 and Waymo Open Dataset benchmarks.

show abstract

“….545 ( 65) 3010 (71) .618 (33) .659 (22) .698 (50) .555 (49) .411 (32) .348 (30) .498 (18) UNS20regress…”

Section: A Extended Results: Mot17mentioning

confidence: 99%

“….568 (51) 1320 (10) .573 (50) .583 (50) .681 (55) .562 (43) .410 (35) .341 (31) .501 (16) EMT .556 (58) 1361 (11) .558 (57) .571 (59) .667 (73) .539 (55) .406 (40) .333 (32) .499 (17) ALBOD .569 (50) 2011 (31) .572 (51) .587 (47) .721 (37) .576 (36) .407 (37) .328 (33) .456 (39) ISE MOT17R…”

Section: A Extended Results: Mot17mentioning

confidence: 99%

Local Metrics for Multi-Object Tracking

Valmadre,

Bewley,

Huang

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…GNNs have been applied in point feature matching [22], [37], gesture learning [45], video moment retrieval [46], visual question answering [47] or single-camera single-object tracking [48]. Regarding single-camera multi-object tracking, [49] proposes the use of GNN to extract node and edge embeddings, but computing similarity using the cosine distance and perform data association by using a linear assignment, i.e., Hungarian Algorithm. The first approach of performing feature and similarity learning jointly for associating detections was introduced in [34] by proposing a time-aware MPN variation: detecting associations across time to perform batch-based/offline singlecamera multi-object tracking.…”

Section: Graph Neural Networkmentioning

confidence: 99%

Graph Neural Networks for Cross-Camera Data Association

Luna¹,

SanMiguel²,

Martínez³

et al. 2022

Preprint

View full text Add to dashboard Cite

Cross-camera image data association is essential for many multi-camera computer vision tasks, such as multi-camera pedestrian detection, multi-camera multi-target tracking, 3D pose estimation, etc. This association task is typically stated as a bipartite graph matching problem and often solved by applying minimum-cost flow techniques, which may be computationally inefficient with large data. Furthermore, cameras are usually treated by pairs, obtaining local solutions, rather than finding a global solution at once. Other key issue is that of the affinity measurement: the widespread usage of non-learnable pre-defined distances, such as the Euclidean and Cosine ones. This paper proposes an efficient approach for cross-cameras data-association focused on a global solution, instead of processing cameras by pairs. To avoid the usage of fixed distances, we leverage the connectivity of Graph Neural Networks, previously unused in this scope, using a Message Passing Network to jointly learn features and similarity. We validate the proposal for pedestrian multiview association, showing results over the EPFL multi-camera pedestrian dataset. Our approach considerably outperforms the literature data association techniques, without requiring to be trained in the same scenario in which it is tested. Our code is available at http://www-vpu.eps.uam.es/publications/gnn cca.

show abstract

“…MPNTracker [4] formulates sequences as graphs and designs a differentiable message passing network to predict the score for each box link between frames. Li et al [17] and Papakis et al [19] use a graph neural network to model appearance and motion (geometric) features and produce the similarities between tracklets and detections. These parametric association modules are trained based on appearance features and motion features.…”

Section: Related Workmentioning

confidence: 99%

“…Human-designed policies are sub-optimal as it is difficult for them to take full advantage of both appearance and motion cues. Beyond human-designed policies, more recent arts [4,17,28,19] attempt to learn association knowledge directly from data with a parametric model, i.e., s ij = K θ (i, j, F a , F m ). As illustrated in Fig.…”

Section: Definition Of Association Knowledgementioning

confidence: 99%

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

Liu¹,

Wang²,

Zhou³

et al. 2021

Preprint

View full text Add to dashboard Cite

Association, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those in real-world datasets. We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view and object movement, so that the simulated videos can provide association modules with effective motion features. Second, experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community 1 .

show abstract

GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization

Cited by 14 publications

References 35 publications

Local Metrics for Multi-Object Tracking

Local Metrics for Multi-Object Tracking

Graph Neural Networks for Cross-Camera Data Association

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

Contact Info

Product

Resources

About