Tiny machine learning (TinyML) is a promising approach to enable intelligent applications relying on Human Activity Recognition (HAR) on resource-limited and low-power internet of things (IoT) edge devices. However, designing efficient TinyML models for these devices remains challenging due to computational resource constraints and the need for customisation to unique use cases. To address this, we propose a novel approach that utilises transfer learning (TL) techniques on edge micro-controller units (MCUs) to accelerate TinyML development. Our strategy involves pre-training generalised ML models on large-scale and varied datasets and fine-tuning them on-device for specific applications using TL. We demonstrate the effectiveness of our approach for HAR by experimenting with a convolutional neural network, long short-term memory TL (CNN-LSTM-TL) model architecture and visualisation techniques like t-distributed stochastic neighbour embedding (t-SNE) for dimensionality reduction. To further validate our model's proficiency and adaptability, we conducted extensive testing using two distinct datasets: MotionSense and UCI. This dual-dataset approach allowed us to assess the robustness of our model across different data domains, showcasing its versatility and effectiveness in various HAR scenarios. Our results show significant model accuracy and reduced training time while maintaining high inference rates and low MCU memory footprint. We also provide insights into best practices for implementing TL on edge MCUs and evaluate classification performance metrics such as accuracy, precision, recall, F1 score, and categorical cross-entropy loss. Our work lays a solid foundation for faster and more efficient TinyML deployments through TL framework across different application domains and types of edge IoT devices.