Embedded systems technology is undergoing a phase of transformation owing to the novel advancements in computer architecture and the breakthroughs in machine learning applications. The areas of applications of embedded machine learning (EML) include accurate computer vision schemes, reliable speech recognition, innovative healthcare, robotics, and more. However, there exists a critical drawback in the efficient implementation of ML algorithms targeting embedded applications. Machine learning algorithms are generally computationally and memory intensive, making them unsuitable for resource-constrained environments such as embedded and mobile devices. In order to efficiently implement these compute and memory-intensive algorithms within the embedded and mobile computing space, innovative optimization techniques are required at the algorithm and hardware levels. To this end, this survey aims at exploring current research trends within this circumference. First, we present a brief overview of compute intensive machine learning algorithms such as hidden Markov models (HMM), k-nearest neighbors (k-NNs), support vector machines (SVMs), Gaussian mixture models (GMMs), and deep neural networks (DNNs). Furthermore, we consider different optimization techniques currently adopted to squeeze these computational and memory-intensive algorithms within resource-limited embedded and mobile environments. Additionally, we discuss the implementation of these algorithms in microcontroller units, mobile devices, and hardware accelerators. Conclusively, we give a comprehensive overview of key application areas of EML technology, point out key research directions and highlight key take-away lessons for future research exploration in the embedded machine learning domain.