The sudden increase in coronavirus disease 2019 (COVID-19) cases puts high pressure on healthcare services worldwide. At this stage, fast, accurate, and early clinical assessment of the disease severity is vital. In general, there are two issues to overcome: (1) Current deep learning-based works suffer from multimodal data adequacy issues; (2) In this scenario, multimodal (e.g., text, image) information should be taken into account together to make accurate inferences. To address these challenges, we propose a multi-modal knowledge graph attention embedding for COVID-19 diagnosis. Our method not only learns the relational embedding from nodes in a constituted knowledge graph but also has access to medical knowledge, aiming at improving the performance of the classifier through the mechanism of medical knowledge attention. The experimental results show that our approach significantly improves classification performance compared to other state-of-the-art techniques and possesses robustness for each modality from multi-modal data. Moreover, we construct a new COVID-19 multi-modal dataset based on text mining, consisting of 1393 doctor-patient dialogues and their 3706 images (347 X-ray
2598 CT
761 ultrasound) about COVID-19 patients and 607 non-COVID-19 patient dialogues and their 10754 images (9658 X-ray
494 CT
761 ultrasound), and the fine-grained labels of all. We hope this work can provide insights to the researchers working in this area to shift the attention from only medical images to the doctor-patient dialogue and its corresponding medical images.