“…Speech technology offers promise because speaking is natural, can be used at a distance, requires no special training, and carries information about a speaker's state. A growing line of AI research has shown that depression can be detected from speech signals using natural language processing (NLP), acoustic models, and multimodal models [3], [4], [5], [6], [7], [8], [9], [10]. Common evaluations with shared data sets, features, and tools have recently led to progress, especially in modeling methods [11], [12], [13], [14], [15].…”