Convolutional neural networks have pushed forward image analysis research and computer vision over the last decade, constituting a state-of-the-art approach in object detection today. The design of increasingly deeper and wider architectures has made it possible to achieve unprecedented levels of detection accuracy, albeit at the cost of both a dramatic computational burden and a large memory footprint. In such a context, cloud systems have become a mainstream technological solution due to their tremendous scalability, providing researchers and practitioners with virtually unlimited resources. However, these resources are typically made available as remote services, requiring communication over the network to be accessed, thus compromising the speed of response, availability, and security of the implemented solution. In view of these limitations, the on-device paradigm has emerged as a recent yet widely explored alternative, pursuing more compact and efficient networks to ultimately enable the execution of the derived models directly on resource-constrained client devices. This study provides an up-to-date review of the more relevant scientific research carried out in this vein, circumscribed to the object detection problem. In particular, the paper contributes to the field with a comprehensive architectural overview of both the existing lightweight object detection frameworks targeted to mobile and embedded devices, and the underlying convolutional neural networks that make up their internal structure. More specifically, it addresses the main structural-level strategies used for conceiving the various components of a detection pipeline (i.e., backbone, neck, and head), as well as the most salient techniques proposed for adapting such structures and the resulting architectures to more austere deployment environments. Finally, the study concludes with a discussion of the specific challenges and next steps to be taken to move toward a more convenient accuracy–speed trade-off.