DASH (Dynamic Adaptive Streaming over HTTP (HyperText Transfer Protocol)) as a universal unified multimedia streaming standard selects the appropriate video bitrate to improve the user’s Quality of Experience (QoE) according to network conditions, client status, etc. Considering that the quantitative expression of the user’s QoE is also a difficult point in itself, this paper researched the distortion caused due to video compression, network transmission and other aspects, and then proposes a video QoE metric for dynamic adaptive streaming services. Three-Dimensional Convolutional Neural Networks (3D CNN) and Long Short-Term Memory (LSTM) are used together to extract the deep spatial-temporal features to represent the content characteristics of the video. While accounting for the fluctuation in the quality of a video caused by bitrate switching on the QoE, other factors such as video content characteristics, video quality and video fluency, are combined to form the input feature vector. The ridge regression method is adopted to establish a QoE metric that enables to dynamically describe the relationship between the input feature vector and the value of the Mean Opinion Score (MOS). The experimental results on different datasets demonstrate that the prediction accuracy of the proposed method can achieve superior performance over the state-of-the-art methods, which proves the proposed QoE model can effectively guide the client’s bitrate selection in dynamic adaptive streaming media services.