“…With the fast development of deep learning for recognition [11], [12], [13], [14], [15], [16], [17], [18], a hierarchical approach is developed for automatically interpreting depression based on the SDS assessment, its associated FE, and action video recording, among other things. To be more specific, we effectively extract the temporal information from each question-wise video by adjusting the 3D convolutional neural networks to the particular question (3D-CNN) [19].…”