Energy efficiency is key in many embedded systems in order to reach the best performance on a limited power budget. In addition, new applications based on neural networks integrate various processing requirements, leading to the use of dedicated hardware functions to optimize energy efficiency. Heterogeneous system-on-chips (SoC) bring together different computing capabilities, such as the Nvidia Jetson AGX Orin. This type of SoC includes a CPU for general-purpose processing, a GPU for intensive data parallelism, and a Deep Learning Accelerator (DLA) dedicated to neural network processing Together, these three components enable new latency and energy consumption trade-offs for Deep-Learning-based applications. But finding the right configuration to reach the best energy efficiency is difficult and sometimes counterintuitive. To take this into account, this paper studies deep neural network design and inference options for each accelerator. Altogether, the study forms guidelines to specifically make the best use of the computing and energy-efficiency capabilities published by manufacturers with the default TensorRT mapping.