Human Activity Recognition (HAR) systems are devised for continuously observing human behavior - primarily in the fields of environmental compatibility, sports injury detection, senior care, rehabilitation, entertainment, and the surveillance in intelligent home settings. Inertial sensors, e.g., accelerometers, linear acceleration, and gyroscopes are frequently employed for this purpose, which are now compacted into smart devices, e.g., smartphones. Since the use of smartphones is so widespread now-a-days, activity data acquisition for the HAR systems is a pressing need. In this article, we have conducted the smartphone sensor-based raw data collection, namely
H-Activity
, using an Android-OS-based application for accelerometer, gyroscope, and linear acceleration. Furthermore, a hybrid deep learning model is proposed, coupling convolutional neural network and long-short term memory network (CNN-LSTM), empowered by the self-attention algorithm to enhance the predictive capabilities of the system. In addition to our collected dataset (
H-Activity
), the model has been evaluated with some benchmark datasets, e.g., MHEALTH, and UCI-HAR to demonstrate the comparative performance of our model. When compared to other models, the proposed model has an accuracy of 99.93% using our collected
H-Activity
data, and 98.76% and 93.11% using data from MHEALTH and UCI-HAR databases respectively, indicating its efficacy in recognizing human activity recognition. We hope that our developed model could be applicable in the clinical settings and collected data could be useful for further research.