Sign language is used as a way of communication by deaf and mute individuals. However, due to the limited number of people who understand sign language, integrating them into society is challenging. Approximately 6.9% of Bangladesh's population and 5% of the world’s population suffer from speech impediments. Individuals with this condition cannot hear what others are saying or communicate verbally, thus sign language must be relied upon. In recent years, sign language recognition has gained attention due to its necessity. However, a scarcity of publicly available dynamic gesture datasets for Bangladeshi Sign Language (BdSL) exists. Dynamic gestures, which contain both spatial and temporal information, are more useful in real-life applications. The classification of dynamic gestures requires more data than static gestures. In this research, 11 numeral Bengali digit signs are aimed to be classified. Data has been collected from the “SignBD-Word” dataset, which contains extracted RGB human body pose keypoints skeleton data. When compared to other dynamic sign language datasets and data requirements for training deep neural networks for dynamic gestures, the data per class in this dataset is found to be insufficient. A hybrid model architecture is proposed in this research to recognize dynamic gestures using lightweight 3DCNN and bidirectional LSTM layers for classifying gestures from the motion patterns of human body pose keypoints skeleton data after experimenting on various models. It has been observed that combining 3DCNN with the pre-trained DenseNet-201 and the BiLSTM model increases real-time accuracy by 4.54%. To the best of our knowledge, this is the first approach to combine 3DCNN with DenseNet-201 for action recognition. Also, one of the earliest investigations on dynamic BdSL hand gesture digit recognition using dynamic data. Additionally, different pre-trained models as base feature extractors have been evaluated.