The advancements in machine learning (ML) opened a new opportunity to bring intelligence to the low-end Internet-of-Things (IoT) nodes, such as microcontrollers. Conventional ML deployment has high memory and computes footprint hindering their direct deployment on ultraresource-constrained microcontrollers. This article highlights the unique requirements of enabling onboard ML for microcontroller-class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure that the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of ML model development for microcontroller-class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.