As a common mental disorder, depression has attracted many researchers from affective computing field to estimate the depression severity. However, existing approaches based on Deep Learning (DL) are mainly focused on single facial image without considering the sequence information for predicting the depression scale. In this paper, an integrated framework, termed DepNet, for automatic diagnosis of depression that adopts facial images sequence from videos is proposed. Specifically, several pretrained models are adopted to represent the low-level features, and Feature Aggregation Module is proposed to capture the high-level characteristic information for depression analysis.More importantly, the discriminative characteristic of depression on faces can be mined to assist the