Audio signal is an important clue for the situation awareness. It provides complementary information for video signal. For home care, elder care, and security application, screaming is one of the events people (family member, care giver, and security guard) are especially interested in. We present here an approach to scream detection, using both analytic and statistical features for the classification. In audio features, sound energy is a useful feature to detect scream like audio. We adopt the log energy to detect the energy continuity of the audio to represent the screaming which is often lasting longer than many other sounds. Further, a robust high pitch detection based on the autocorrelation is presented to extract the highest pitch for each frame, followed by a pitch analysis for a time window containing multiple frames. Finally, to validate the scream like sound, a SVM based classifier is applied with the feature vector generated from the MFCCs across a window of frames. Experiments of screaming detection is discussed with promising results shown. The algorithm is ported and run in a Linux based set top box connected by a microphone array to capture the audio for live scream detection.
One of the most frequently used coding mode in H.264 is skip mode. In the conventional approach, after the best RD mode has been computed and the resultant predicted error coefficients block is all quantized to zero, it is switched to skip mode. This is a waste of computational resources because skip mode doesn't require forward transform and quantization. In this paper, skip mode condition is checked for the macroblock prior to multi-block motion estimation. Motion estimation will not be performed if the condition is satisfied which will drastically reduce the computations. The condition considers zero-block property after 4x4 block transform/quantisation and caters for noise inherent in natural video images. In addition, color components are also taken into consideration for skip mode decision. The experimental results show that the approach can improve encoder speed greatly with negligible bit rate increase or PSNR degradation.
As one important component in H.264 video encoder, interpolation of half and quarter pixels is also computational intensive. Compared to integer pixel motion estimation, "finer" interpolation provides better block match. However, this good motion compensation performance is obtained at the expense of increased complexity. Based on our previous work, this paper presents an improved fast and adaptive interpolation method that further reduces the complexity of video encoding process. Experimental results on typical video sequences demonstrate that the proposed method is able to increase encoder speed ranging from 10% to 22% compared with our previous work without any PSNR loss or bit rate increase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.