In the last decade, data-driven algorithms outperformed traditional optimization-based algorithms in many research areas, such as computer vision, natural language processing, etc. However, extensive data usages bring a new challenge or even threat to deep learning algorithms, i.e., privacypreserving. Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models. This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective. This framework dynamically orchestrates available resources among heterogeneous physical units to efficiently fulfill deep learning objectives. The design jointly considers learning task requests and underlying infrastructure heterogeneity, including last-mile transmissions, computation abilities of mobile devices, edge and cloud computing centers, and devices' battery status. Furthermore, to significantly reduce distributed training overheads, small-scale data training is proposed by integrating with a general, simple data classifier. This low-load enhancement can seamlessly work with various distributed deep models to improve communications and computation efficiencies during the training phase. Finally, open challenges and future research directions encourage the research community to develop efficient distributed deep learning techniques.