Wi-Fi-based human activity recognition (HAR) has gained considerable attention recently due to its ease of use and the availability of its infrastructures and sensors. Channel state information (CSI) captures how Wi-Fi signals are transmitted through the environment. Using channel state information of the received signals transmitted from Wi-Fi access points, human activity can be recognized with more accuracy compared with the received signal strength indicator (RSSI). However, in many scenarios and applications, there is a serious limit in the volume of training data because of cost, time, or resource constraints. In this study, multiple deep learning models have been trained for HAR to achieve an acceptable accuracy level while using less training data compared to other machine learning techniques. To do so, a pretrained encoder which is trained using only a limited number of data samples, is utilized for feature extraction. Then, by using fine-tuning, this encoder is utilized in the classifier, which is trained by a fraction of the rest of the data, and the training is continued alongside the rest of the classifier’s layers. Simulation results show that by using only 50% of the training data, there is a 20% improvement compared with the case where the encoder is not used. We also showed that by using an untrainable encoder, an accuracy improvement of 11% using 50% of the training data is achievable with a lower complexity level.