IntroductionDepression is an affective disorder that contributes to a significant global burden of disease. Measurement-Based Care (MBC) is advocated during the full course management, with symptom assessment being an important component. Rating scales are widely used as convenient and powerful assessment tool, but they are influenced by the subjectivity and consistency of the raters. The assessment of depressive symptoms is usually conducted with a clear purpose and restricted content, such as clinical interviews based on the Hamilton Depression Rating Scale (HAMD), so that the results are easy to obtain and quantify. Artificial Intelligence (AI) techniques are used due to their objective, stable and consistent performance, and are suitable for assessing depressive symptoms. Therefore, this study applied Deep Learning (DL)-based Natural Language Processing (NLP) techniques to assess depressive symptoms during clinical interviews; thus, we proposed an algorithm model, explored the feasibility of the techniques, and evaluated their performance.MethodsThe study included 329 patients with Major Depressive Episode. Clinical interviews based on the HAMD-17 were conducted by trained psychiatrists, whose speech was simultaneously recorded. A total of 387 audio recordings were included in the final analysis. A deeply time-series semantics model for the assessment of depressive symptoms based on multi-granularity and multi-task joint training (MGMT) is proposed.ResultsThe performance of MGMT is acceptable for assessing depressive symptoms with an F1 score (a metric of model performance, the harmonic mean of precision and recall) of 0.719 in classifying the four-level severity of depression and an F1 score of 0.890 in identifying the presence of depressive symptoms.DisscussionThis study demonstrates the feasibility of the DL and the NLP techniques applied to the clinical interview and the assessment of depressive symptoms. However, there are limitations to this study, including the lack of adequate samples, and the fact that using speech content alone to assess depressive symptoms loses the information gained through observation. A multi-dimensional model combing semantics with speech voice, facial expression, and other valuable information, as well as taking into account personalized information, is a possible direction in the future.