Learning and inference at the edge is all about distilling, exchanging, and processing data in a cooperative and distributed way, to achieve challenging trade-offs involving energy, delay, and accuracy. This calls for a joint orchestration of radio and computing resources. We propose an online adaptive resource allocation algorithm to choose where to compute, and how to offload computations, exploiting the concept of Deep Neural Network (DNN) splitting. The latter allows a device to locally execute part of an inference related processing, and delegate the other portion to a nearby Mobile Edge Host (MEH), which receives intermediate results from the device via a time varying wireless communication channel. Our method deals with dynamic parameters involving wireless channels, data arrivals, and MEH's CPU availability, by taking online control actions including the best splitting point, and the uplink data rate to transfer raw data or intermediate results (e.g., extracted features). The decision is taken only based on instantaneous observations of context parameters, to minimize the long-term device energy consumption, while guaranteeing the end-to-end delay not to exceed a predefined threshold, on average and probabilistic sense. Besides a theoretical analysis, numerical simulations show the effectiveness of our adaptive method in selecting the best partial offloading decision (DNN splitting) under different network conditions. Differently from previous works on edge inference, we exploit recently developed empirical models for the energy consumption of NVIDIA ® edge boards, to evaluate the performance of DNN splitting at the edge, when exploring the typical offloading trade-off between energy and delay, both entailing communication and computing.