Workload orchestration at the edge of the network has become increasingly challenging with the ever-increasing penetration of resource demanding mobile, and heterogeneous devices offering low latency services. Literature has addressed this challenge assuming the availability of multi-access Mobile Edge Computing (MEC) servers and placing the computing tasks related to such services on the MEC servers. However, to develop a more sustainable and energy-efficient computing paradigm, for applications operating in stochastic environments with unpredictable workloads, it is essential to minimize the MEC servers' usage, and utilize the available resource-constrained edge devices, to keep the resourceful servers idle for handling any unpredictable larger workload. In this paper, we proposed DEWOrch, a deep reinforcement Learning algorithm for efficient workload orchestration. DEWOrch's aim is to increase the utilization of resource-constrained edge devices and minimize resource waste for more sustainable and energy efficient computing solution. This model is evaluated in an Extreme Edge Computing environment, where no MEC servers is available and only edge devices with constrained capacity are used to perform tasks. The results show that DEWOrch outperforms the state-of-the-art methods by around 50% decrease in resource waste while improved task success rate, and decreased energy consumption per task in most scenarios.