2019
DOI: 10.1007/s11042-019-07747-2
|View full text |Cite
|
Sign up to set email alerts
|

Grid-based multi-object tracking with Siamese CNN based appearance edge and access region mechanism

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 45 publications
0
3
0
Order By: Relevance
“…Many authors have found inventive and effective solutions to tracking problems using neural network-based models, since they offer the most robust features while being natively designed to solve focus-and-context problems in data sequences. Tracking from features directly learned by simple single-stream convolutional layers [8,16,18] Dual-stream CNNs with data associations performed by additional model components [6,19,28,37] Tracking using responses from convolutional features processed through correlation filters [2][3][4][5]22,23] Multistream CNNs that determine similarities between multiple ROIs and target templates [14] Models that determine appearance descriptors or that generate appearance representations from convolutional features [1,13,25,27,32] Convolutional models that account for temporal coherence using a multi-network pipeline [24] Tracking from features learned by fusing responses from dual stream convolutional layers (Siamese CNNs) [26,30,31,38] Models that generate features from convolutional layers and use attention mechanisms for temporal coherence and matching [15,[39][40][41] Multi-stream convolutional layers used for detecting pedestrian poses [17] Siamese networks combining convolutional features with complementary features from image processing [33] Methods based on recurrent neural networks…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Many authors have found inventive and effective solutions to tracking problems using neural network-based models, since they offer the most robust features while being natively designed to solve focus-and-context problems in data sequences. Tracking from features directly learned by simple single-stream convolutional layers [8,16,18] Dual-stream CNNs with data associations performed by additional model components [6,19,28,37] Tracking using responses from convolutional features processed through correlation filters [2][3][4][5]22,23] Multistream CNNs that determine similarities between multiple ROIs and target templates [14] Models that determine appearance descriptors or that generate appearance representations from convolutional features [1,13,25,27,32] Convolutional models that account for temporal coherence using a multi-network pipeline [24] Tracking from features learned by fusing responses from dual stream convolutional layers (Siamese CNNs) [26,30,31,38] Models that generate features from convolutional layers and use attention mechanisms for temporal coherence and matching [15,[39][40][41] Multi-stream convolutional layers used for detecting pedestrian poses [17] Siamese networks combining convolutional features with complementary features from image processing [33] Methods based on recurrent neural networks…”
Section: Discussionmentioning
confidence: 99%
“…Both components produce feature maps that are composed together to form space-and motion-invariant characteristics to be further used for target identification. As such, a common functionality of such models is to feed the similarities learned among different inputs to subsequent network components that carry out the classification/detection task [32,33]. Some authors take this concept further by employing several network components [34], each of which contributes features exhibiting specific and limited correlations.…”
Section: Ensuring Temporal Coherencementioning
confidence: 99%
“…Different from Multi-Target Single-Camera (MTSC) tracking [3,5,22], MTMC tracking entails the analysis of visual signals captured by multiple cameras, considering setups with overlapping fields of view (FOVs), but also scenarios for wide-area monitoring, where cameras may be separated by large distances. Road intersections are well-known targets for monitoring due to the high number of reported accidents and collisions [37].…”
mentioning
confidence: 99%