Dehai Zhao scite author profile

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed.

show abstract

A Distributed Video Management Cloud Platform Using Hadoop

Liu

Zhao

et al. 2015

IEEE Access

View full text Add to dashboard Cite

Due to complexities of big video data management, such as massive processing of large amount of video data to do a video summary, it is challenging to effectively and efficiently store and process these video data in a user friendly way. Based on the parallel processing and flexible storage capabilities of cloud computing, in this paper, we propose a practical massive video management platform using Hadoop, which can achieve a fast video processing (such as video summary, encoding, and decoding) using MapReduce, with good usability, performance, and availability. Red5 streaming media server is used to get video stream from Hadoop distributed file system, and Flex is used to play video in browsers. A user-friendly interface is designed for managing the whole platform in a browser-server style using J2EE. In addition, we show our experiences on how to fine-tune the Hadoop to get optimized performance for different video processing tasks. The evaluations show that the proposed platform can satisfy the requirements of massive video data management.

show abstract

Multi-source data fusion using deep learning for smart refrigerators

Zhang

Zhai

et al. 2018

Computers in Industry

View full text Add to dashboard Cite

ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts

Zhao

Xing

Chen

et al. 2019

View full text Add to dashboard Cite

Programming screencasts (e.g., video tutorials on Youtube or live coding stream on Twitch) are important knowledge source for developers to learn programming knowledge, especially the workflow of completing a programming task. Nonetheless, the image nature of programming screencasts limits the accessibility of screencast content and the workflow embedded in it, resulting in a gap to access and interact with the content and workflow in programming screencasts. Existing non-intrusive methods are limited to extract either primitive human-computer interaction (HCI) actions or coarse-grained video fragments. In this work, we leverage Computer Vision (CV) techniques to build a programming screencast analysis tool which can automatically extract code-line editing steps (enter text, delete text, edit text and select text) from screencasts. Given a programming screencast, our approach outputs a sequence of coding steps and code snippets involved in each step, which we refer to as programming workflow. The proposed method is evaluated on 41 hours of tutorial videos and live coding screencasts with diverse programming environments. The results demonstrate our tool can extract code-line editing steps accurately and the extracted workflow steps can be intuitively understood by developers.

show abstract

Workload Prediction for Cloud Cluster Using a Recurrent Neural Network

Zhang

Zhao

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dehai Zhao

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

A Distributed Video Management Cloud Platform Using Hadoop

Multi-source data fusion using deep learning for smart refrigerators

ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts

Workload Prediction for Cloud Cluster Using a Recurrent Neural Network

Contact Info

Product

Resources

About