2021
DOI: 10.1109/access.2021.3085708
|View full text |Cite
|
Sign up to set email alerts
|

Making Sense of Neuromorphic Event Data for Human Action Recognition

Abstract: Neuromorphic vision sensors provide low power sensing and capture salient spatial-temporal events. The majority of the existing neuromorphic sensing work focus on object detection. However, since they only record the events, they provide an efficient signal domain for privacy aware surveillance tasks. This paper explores how the neuromorphic vision sensor data streams can be analysed for human action recognition, which is a challenging application. The proposed method is based on handcrafted features. It consi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(21 citation statements)
references
References 51 publications
(43 reference statements)
0
21
0
Order By: Relevance
“…Thus, for most practical purposes, our proposed framework attains comparable performance to the STDN [56]. Rest of the methods that include multi-task hierarchical clustering [57], BT-LSTM [58], deep autoencoder [59], two-stream attention LSTM [60], weighted entropy-variance based feature selection [61], dilated CNN+BiLSTM+RB [62], DS-GRU [43], and local-global features + QSVM [63] obtain 89.7%, 85.3%, 96.2%, 96.9%, 94.5%, 89.0%, 97.1%, and 82.6% accuracies, respectively. For the UCF50 dataset, the proposed method dominates the state-of-the-art methods by obtaining the best accuracy of 97.5%, whereas the (LD-BF) + (LD-DF) [64] obtains the second-based accuracy of 96.7%.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 97%
See 1 more Smart Citation
“…Thus, for most practical purposes, our proposed framework attains comparable performance to the STDN [56]. Rest of the methods that include multi-task hierarchical clustering [57], BT-LSTM [58], deep autoencoder [59], two-stream attention LSTM [60], weighted entropy-variance based feature selection [61], dilated CNN+BiLSTM+RB [62], DS-GRU [43], and local-global features + QSVM [63] obtain 89.7%, 85.3%, 96.2%, 96.9%, 94.5%, 89.0%, 97.1%, and 82.6% accuracies, respectively. For the UCF50 dataset, the proposed method dominates the state-of-the-art methods by obtaining the best accuracy of 97.5%, whereas the (LD-BF) + (LD-DF) [64] obtains the second-based accuracy of 96.7%.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 97%
“…For the UCF50 dataset, the proposed method dominates the state-of-the-art methods by obtaining the best accuracy of 97.5%, whereas the (LD-BF) + (LD-DF) [64] obtains the second-based accuracy of 96.7%. The local-global features + QSVM [63] achieves the lowest accuracy of 69.4%, whereas the rest of the methods including multi-task hierarchical clustering [57], deep autoencoder [59], ensemble model with sward-based optimization [65], and DS-GRU [43] obtain [57] 2017 89.7 BT-LSTM [58] 2018 85.3 Deep autoencoder [59] 2019 96.2 STDN [56] 2020 98.2 Two-stream attention LSTM [60] 2020 96.9 Weighted entropy-variances based feature selection [61] 2021 94.5 Dilated CNN+BiLSTM+RB [62] 2021 89.0 DS-GRU [43] 2021 97.1 Local-global features + QSVM [63] 2021 82.6 DA-CNN+Bi-GRU (Proposed) 2022 98.0 Finally, for the HMDB51 dataset comprising of challenging action videos, our proposed method achieves the best results by obtaining an accuracy of 79.3%, whereas the runnerup method is evidential deep learning [66] that attains an accuracy of 77.0%. The multi-task hierarchical clustering method [57] achieves an accuracy of 51.4%, which is the lowest among all comparative methods on HMDB51 dataset.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…Thus, for most practical purposes, our proposed framework attains comparable performance to the STDN [56]. Rest of the methods that include multi-task hierarchical clustering [57], BT-LSTM [58], deep autoencoder [59], two-stream attention LSTM [60], weighted entropy-variance based feature selection [61], dilated CNN+BiLSTM+RB [62], DS-GRU [43], and local-global features + QSVM [63] [57] 2017 89.7 BT-LSTM [58] 2018 85.3 Deep autoencoder [59] 2019 96.2 STDN [56] 2020 98.2 Two-stream attention LSTM [60] 2020 96.9 Weighted entropy-variances based feature selection [61] 2021 94.5 Dilated CNN+BiLSTM+RB [62] 2021 89.0 DS-GRU [43] 2021 Finally, for the HMDB51 dataset comprising of challenging action videos, our proposed method achieves the best results by obtaining an accuracy of 79.3%, whereas the runnerup method is evidential deep learning [66] that attains an accuracy of 77.0%. The multi-task hierarchical clustering method [57] achieves an accuracy of 51.4%, which is the lowest among all comparative methods on HMDB51 dataset.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 97%
“…As pointed out by Ref. [5], under certain dynamic scenes, hand-crafted features can result in higher accuracy than deep-learning-based methods.…”
Section: Voxel-based Methodsmentioning
confidence: 98%
“…They enjoy various advantages such as low latency (in microsecond level), high dynamic range, low power consumption, and low bandwidth cost [2], [3]. Because of the appealing characteristics of event cameras, event-based vision is gaining more attention in many research areas and applications, such as image reconstruction [4], action recognition [5], auto-driving [6], real-time detection [7], motion estimation [8], [9] and tracking [10].…”
Section: Introductionmentioning
confidence: 99%