Nowadays, the advancements of wearable consumer devices have become a predominant role in healthcare gadgets. There is always a demand to obtain robust recognition of heterogeneous human activities in complicated IoT environments. The knowledge attained using these recognition models will be then combined with healthcare applications. In this way, the paper proposed a novel deep learning framework to recognize heterogeneous human activities using multimodal sensor data. The proposed framework is composed of four phases: employing dataset and processing, implementation of deep learning model, performance analysis, and application development. The paper utilized the recent KU-HAR database with eighteen different activities of 90 individuals. After preprocessing, the hybrid model integrating Extreme Learning Machine (ELM) and Gated Recurrent Unit (GRU) architecture is used. An attention mechanism is then included for further enhancing the robustness of human activity recognition in the IoT environment. Finally, the performance of the proposed model is evaluated and comparatively analyzed with conventional CNN, LSTM, GRU, ELM, Transformer and Ensemble algorithms. To the end, an application is developed using the Qt framework which can be deployed on any consumer device. In this way, the research sheds light on monitoring the activities of critical patients by healthcare professionals remotely. The proposed ELM-GRUaM model achieved supreme performance in recognizing multimodal human activities with an overall accuracy of 96.71% as compared with existing models.