Electricity load forecasting is a crucial undertaking within all the deregulated markets globally. In contemporary times, the transition from conventional electricity grids to Smart Grids constitutes an area where extensive research is conducted on a global scale. Among the research challenges, the investigation of Deep Transfer Learning (DTL) in the field of electricity load forecasting represents a fundamental effort that imparts generality to Artificial Intelligence applications, due to new capabilities, such as knowledge transfer and computational power reduction. In this paper a comprehensive study is conducted for day-ahead electricity load forecasting. For this purpose, three Sequence-to-Sequence (Seq2seq) Deep Learning (DL) models are used, namely the Multilayer Perceptron (MLP), the Convolutional Neural Network (CNN) and the Ensemble Learning Model (ELM), which is consisted of the weighted combination of the outputs of MLP and CNN models. Also, the study focuses on the development of different forecasting strategies based on DTL and emphasizing the way the datasets are trained and fine-tuning for higher forecasting accuracy. In order to implement the forecasting strategies using Deep Learning models, load datasets from three Greek islands, Rhodes, Lesvos, and Chios, are used. The main purpose is to apply DTL for day-ahead predictions (1-24 hours) for each month of the year for Chios dataset after training and fine-tuning the models using the datasets of the three islands in various combinations. After several trials, four DTL strategies are illustrated. In the first strategy (DTL Case 1), each of the three DL models is trained using only Lesvos dataset, while fine-tuning is performed on the dataset of Chios island, in order to create day-ahead predictions for Chios load. In the second strategy (DTL Case 2), data from both Lesvos and Rhodes concurrently are used for the DL model training period, and fine-tuning is performed on the data from Chios. The third DTL strategy (DTL Case 3) contains the training of the DL models using Lesvos dataset, and the testing period performed directly on the Chios dataset without fine-tuning. The fourth strategy is a Multi-task Deep Learning (MTDL) approach, which has been extensively studied in recent years. In MTDL, the three DL models are trained simultaneously on all three datasets and the final predictions are made on the unknown part of the dataset of Chios. In this paper, we explore the performance of DTL and compare the results with those produced with MTDL. The results demonstrated that DTL can be applied with high efficiency for day-ahead load forecasting. Specifically, the two cases with fine-tuning (DTL Case 1 and 2) outperformed MTDL in terms of load prediction accuracy. Regarding the DL models, all three exhibit very high prediction accuracy, especially in the two cases with fine-tuning. The ELM excels compared to the single models. More specifically, for conducting day-ahead predictions, it has been concluded that the MLP model presents best monthly forecasts with a MAPE of 6.24% and 6.01% for the first two cases, the CNN model presents best monthly forecasts with a MAPE of 5.57% and 5.60% respectively and the ELM model achieves best monthly forecasts with a MAPE of 5.29% and 5.31%, respectively, indicating the very high accuracy it can achieve.