Early detection and treatment of Social Anxiety Disorder (SAD) is crucial. However, current diagnostic methods have several drawbacks, including being time-consuming for clinical interviews, susceptible to emotional bias for self-reports, and inconclusive for physiological measures. Our research focuses on a digital approach using acoustic and linguistic features extracted from participants’ “speech” for diagnosing SAD. Our methodology involves identifying correlations between extracted features and SAD severity, selecting the effective features, and comparing classical machine learning and deep learning methods for predicting SAD. Our results demonstrate that both acoustic and linguistic features outperform deep learning approaches when considered individually. Logistic Regression proves effective for acoustic features, while Random Forest excels with linguistic features, achieving the highest accuracy of 85.71%. Our findings pave the way for non-intrusive SAD diagnosing that can be used conveniently anywhere, facilitating early detection.