An efficient algorithm is proposed to reduce the computation cost of block matching algorithms for motion estimation in video coding. Based on a new insight in block matching algorithms, we extend the successive elimination algorithm to a multilevel case. By using the sum norms of the blocks and the subblocks, tighter and tighter decision boundaries can be obtained for eliminating the search positions. The efficiency of the proposed algorithm combined with the full search algorithm and several fast search algorithms is verified by simulation results.
Automatic speech emotion recognition has been a research hotspot in the field of human-computer interaction over the past decade. However, due to the lack of research on the inherent temporal relationship of the speech waveform, the current recognition accuracy needs improvement. To make full use of the difference of emotional saturation between time frames, a novel method is proposed for speech recognition using frame-level speech features combined with attention-based long short-term memory (LSTM) recurrent neural networks. Frame-level speech features were extracted from waveform to replace traditional statistical features, which could preserve the timing relations in the original speech through the sequence of frames. To distinguish emotional saturation in different frames, two improvement strategies are proposed for LSTM based on the attention mechanism: first, the algorithm reduces the computational complexity by modifying the forgetting gate of traditional LSTM without sacrificing performance and second, in the final output of the LSTM, an attention mechanism is applied to both the time and the feature dimension to obtain the information related to the task, rather than using the output from the last iteration of the traditional algorithm. Extensive experiments on the CASIA, eNTERFACE, and GEMEP emotion corpora demonstrate that the performance of the proposed approach is able to outperform the state-of-the-art algorithms reported to date.
In this correspondence, we address the facial expression recognition problem using kernel canonical correlation analysis (KCCA). Following the method proposed by Lyons et al. and Zhang et al., we manually locate 34 landmark points from each facial image and then convert these geometric points into a labeled graph (LG) vector using the Gabor wavelet transformation method to represent the facial features. On the other hand, for each training facial image, the semantic ratings describing the basic expressions are combined into a six-dimensional semantic expression vector. Learning the correlation between the LG vector and the semantic expression vector is performed by KCCA. According to this correlation, we estimate the associated semantic expression vector of a given test image and then perform the expression classification according to this estimated semantic expression vector. Moreover, we also propose an improved KCCA algorithm to tackle the singularity problem of the Gram matrix. The experimental results on the Japanese female facial expression database and the Ekman's "Pictures of Facial Affect" database illustrate the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.