The global epidemic caused by COVID-19 has had a severe impact on the health of human beings. The virus has wreaked havoc throughout the world since its declaration as a worldwide pandemic and has affected an expanding number of nations in numerous countries around the world. Recently, a substantial amount of work has been done by doctors, scientists, and many others working on the frontlines to battle the effects of the spreading virus. The integration of artificial intelligence, specifically deep- and machine-learning applications, in the health sector has contributed substantially to the fight against COVID-19 by providing a modern innovative approach for detecting, diagnosing, treating, and preventing the virus. In this proposed work, we focus mainly on the role of the speech signal and/or image processing in detecting the presence of COVID-19. Three types of experiments have been conducted, utilizing speech-based, image-based, and speech and image-based models. Long short-term memory (LSTM) has been utilized for the speech classification of the patient’s cough, voice, and breathing, obtaining an accuracy that exceeds 98%. Moreover, CNN models VGG16, VGG19, Densnet201, ResNet50, Inceptionv3, InceptionResNetV2, and Xception have been benchmarked for the classification of chest X-ray images. The VGG16 model outperforms all other CNN models, achieving an accuracy of 85.25% without fine-tuning and 89.64% after performing fine-tuning techniques. Furthermore, the speech–image-based model has been evaluated using the same seven models, attaining an accuracy of 82.22% by the InceptionResNetV2 model. Accordingly, it is inessential for the combined speech–image-based model to be employed for diagnosis purposes since the speech-based and image-based models have each shown higher terms of accuracy than the combined model.