The motor imagery brain-computer interface (MI-BCI) has the ability to use electroencephalogram (EEG) signals to control and communicate with external devices. By leveraging the unique characteristics of task-related brain signals, this system facilitates enhanced communication with these devices. Such capabilities hold significant potential for advancing rehabilitation and the development of assistive technologies. In recent years, deep learning has received considerable attention in the MI-BCI field due to its powerful feature extraction and classification capabilities. However, two factors significantly impact the performance of deep-learning models. The size of the EEG datasets influences how effectively these models can learn. Similarly, the ability of classification models to extract features directly affects their accuracy in recognizing patterns. In this paper, we propose a Multi-Scale Spatio-Temporal and Dynamic Graph Convolution Fusion Network (MST-DGCN) to address these issues. In the data-preprocessing stage, we employ two strategies, data augmentation and transfer learning, to alleviate the problem of an insufficient data volume in deep learning. By using multi-scale convolution, spatial attention mechanisms, and dynamic graph neural networks, our model effectively extracts discriminative features. The MST-DGCN mainly consists of three parts: the multi-scale spatio-temporal module, which extracts multi-scale information and refines spatial attention; the dynamic graph convolution module, which extracts key connectivity information; and the classification module. We conduct experiments on real EEG datasets and achieve an accuracy of 77.89% and a Kappa value of 0.7052, demonstrating the effectiveness of the MST-DGCN in MI-BCI tasks. Our research provides new ideas and methods for the further development of MI-BCI systems.