In recent years, deep learning (DL) models have demonstrated remarkable achievements on non-trivial tasks such as speech recognition, image processing, and natural language understanding. One of the significant contributors to its success is the proliferation of end devices that acted as a catalyst to provide data for data-hungry DL models. However, computing DL training and inference is the main challenge. Usually, central cloud servers are used for the computation, but it opens up other significant challenges, such as high latency, increased communication costs, and privacy concerns. To mitigate these drawbacks, considerable efforts have been made to push the processing of DL models to edge servers (a mesh of computing devices near end devices). Moreover, the confluence point of DL and edge has given rise to edge intelligence (EI). International Electrotechnical Commission (IEC) defines EI as the concept where the data is acquired, stored, and processed utilizing edge computing with DL and advanced networking capabilities. Broadly, EI has six levels of categories based on where the training and inference of DL take place, e.g., cloud server, edge server and end devices. This survey paper focuses primarily on the fifth level of EI, called all in-edge level, where DL training and inference (deployment) are performed solely by edge servers. All in-edge is suitable when the end devices have low computing resources, e.g., Internet-of-Things, and other requirements such as latency and communication cost are important such as in mission-critical applications (e.g., health care). Besides, 5G/6G networks are envisioned to use all in-edge. Firstly, this paper presents all in-edge computing architectures, including centralized, decentralized, and distributed. Secondly, this paper presents enabling technologies, such as model parallelism, data parallelism, and split learning, which facilitates DL training and deployment at edge servers. Thirdly, model adaptation techniques based on model compression and conditional computation are described because the standard cloud-based DL deployment cannot be directly applied to all in-edge due to its limited computational resources. Fourthly, this paper discusses eleven key performance metrics to evaluate the performance of DL at all in-edge efficiently. Finally, several open research challenges in the area of all in-edge are presented. INDEX TERMS Artificial intelligence, all in-edge, deep learning, distributed systems, decentralized systems, edge intelligence I. INTRODUCTION T HE global community is increasingly becoming a datadriven environment in which end devices are generating vast quantities of data outside of the traditional data centers. International Telecommunication Union anticipates that global internet traffic per month will reach 607 Exabytes (EB) in 2025 and 5016 EB in 2030 [1]. This enormous amount of data has a positive impact on artificial intelligence (AI) applications. In particular, deep learning (DL) rely on the availability of large quantities of data for its d...