Functional near-infrared spectroscopy (fNIRS) is a low-cost and noninvasive method to measure the hemodynamic responses of cortical brain activities and has received great attention in brain-computer interface (BCI) applications. In this paper, we present a method based on deep learning and the time-frequency map (TFM) of fNIRS signals to classify the three motor execution tasks including right-hand tapping, left-hand tapping, and foot tapping. To simultaneously obtain the TFM and consider the correlation among channels, we propose to utilize the two-dimensional discrete orthonormal Stockwell transform (2D-DOST). The TFMs for oxygenated hemoglobin (HbO), reduced hemoglobin (HbR), and two linear combinations of them are obtained and then we propose three fusion schemes for combining their deep information extracted by the convolutional neural network (CNN). Two CNNs, LeNet and MobileNet, are considered and their structures are modified to maximize the accuracy. Due to the lack of enough signals for training CNNs, data augmentation based on the Wasserstein generative adversarial network (WGAN) is performed. Several simulations are performed to assess the performance of the proposed method in three-class and binary scenarios. The results present the efficiency of the proposed method in different scenarios. Also, the proposed method outperforms the recently introduced methods.