Objectives
This study aimed to develop a classification model to detect and distinguish apathy and depression based on text, audio, and video features and to make use of the shapely additive explanations (SHAP) toolkit to increase the model interpretability.
Methods
Subjective scales and objective experiments were conducted on 319 mild cognitive impairment (MCI) patients to measure apathy and depression. The MCI patients were classified into four groups, depression only, apathy only, depressed‐apathetic, and the normal group. Speech, facial and text features were extracted using the open‐source data analysis toolkits. Multiclass classification and SHAP toolkits were used to develop a classification model and explain the contribution of specific features.
Results
The macro‐averaged f1 score and accuracy for overall model were 0.91 and 0.90, respectively. The accuracy for the apathetic, depressed, depressed‐apathetic, and normal groups were 0.98, 0.88, 0.93, and 0.82, respectively. The SHAP toolkit identified speech features (Mel‐frequency cepstral coefficient (MFCC) 4, spectral slopes, F0, F1), facial features (action unit (AU) 14, 26, 28, 45), and text feature (text 6 semantic) associated with apathy. Meanwhile, speech features (spectral slopes, shimmer, F0) and facial expression (AU 2, 6, 7, 10, 14, 26, 45) were associated with depression. Apart from the shared features mentioned above, new speech (MFCC 2, loudness) and facial (AU 9) features were observed in the depressive‐apathetic group.
Conclusions
Apathy and depression shared some verbal and facial features while also exhibited distinct features. A combination of text, audio, and video could be used to improve the early detection and differential diagnosis of apathy and depression in MCI patients.