Deep learning has dominated the last decade as the go-to technology for data processing. More than that, deep learning is also the current promise for replacing traditional compression algorithms. Uniting both these capabilities, studying image compression in scenarios where data will later be consumed by a deep neural network presents a unique frontier of exploration, with insights into how neural networks become efficient both in computation and information usage during inference. In this dissertation, we present a collection of 3 works that explore this frontier in the realm of embedded devices. We first introduce the notion of splitting neural networks as a form of compression for image classification exploring how the compressibility of representations evolve through the layers of a model. Later we study how this can be leveraged in object detection, and present a methodology for flexible models that accommodate fluctuating operational requirements of computation and bandwidth. Finally we take special attention to the role of this technology in augmented reality providing an yet improved design, also flexible and with great hardware/software synergy, based on an ensemble of encoders that can scaled in size at run-time. Designs are tested in a target device with ample comparison to the literature.