2022
DOI: 10.3390/app12115455
|View full text |Cite
|
Sign up to set email alerts
|

KFSENet: A Key Frame-Based Skeleton Feature Estimation and Action Recognition Network for Improved Robot Vision with Face and Emotion Recognition

Abstract: In this paper, we propose an integrated approach to robot vision: a key frame-based skeleton feature estimation and action recognition network (KFSENet) that incorporates action recognition with face and emotion recognition to enable social robots to engage in more personal interactions. Instead of extracting the human skeleton features from the entire video, we propose a key frame-based approach for their extraction using pose estimation models. We select the key frames using the gradient of a proposed total … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…It is concluded that key-frame extraction is highly significant in processing video data. In general, existing key-frame extraction methods consist of shot boundary detection (Fei et al , 2017; Mehmood et al , 2016), frame image clustering (Wu et al , 2017; Gharbi et al , 2017), motion analysis (Le et al , 2022; Anderson and McOwan, 2006) and visual content analysis (Panagiotakis et al , 2009).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…It is concluded that key-frame extraction is highly significant in processing video data. In general, existing key-frame extraction methods consist of shot boundary detection (Fei et al , 2017; Mehmood et al , 2016), frame image clustering (Wu et al , 2017; Gharbi et al , 2017), motion analysis (Le et al , 2022; Anderson and McOwan, 2006) and visual content analysis (Panagiotakis et al , 2009).…”
Section: Related Workmentioning
confidence: 99%
“…Key-frame extraction aims to extract a set of images from original video, which are expected to be an approximate representation of the visual contents of the entire video (Huang and Wang, 2019). Traditional key-frame extraction methods consist of shot boundary detection (Fei et al , 2017; Mehmood et al , 2016), frame image clustering (Wu et al , 2017; Gharbi et al , 2017), motion analysis (Le et al , 2022; Anderson and McOwan, 2006) and visual content analysis (Panagiotakis et al , 2009). The shot boundary-based methods are simple and computationally efficient, but they can only select a fixed number of images as key-frames without considering the content complexity (Fei et al , 2017; Mehmood et al , 2016).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The keyframe is the frame with the highest histogram correlation from a set of consecutive frames. Choosing the number of consecutive frames (shot) affects the keyframe extraction accuracy [74,75], where If the consecutive frames are small sets of frames the variation of frame histograms will be very small to identify one keyframe, and if we use a large set of consecutive frames there could be more than one keyframe in the shot and we only extract one and neglect others. We explore the keyframe extraction method for a variant video clip length: 5 frames per clip and 16 frames per clip.…”
Section: The Image-based Model (R2d-lstm)mentioning
confidence: 99%
“…However, emotion classification is still a challenging task. Convolutional neural networks (CNNs) perform face normalization, facial expressions, and emotional classification using real images as their main functions and are frequently adopted and used in computer vision applications [3][4][5][6]. The accuracy of the CNN-based emotion classification system has been improved through pre-or post-processing [2,5,7] and the development of new algorithms in the architecture.…”
Section: Introductionmentioning
confidence: 99%