People who are deaf or hard of hearing typically communicate primarily through sign language (SL). A complication arises when a deaf\mute person tries to use SL with someone who aren't familiar with it. The use of modern technological advancements will be useful in this situation. In recent years, prediction of SLs has been greatly improved by deep learning (DL) techniques. Amongst, a spatial-temporal multi-cue (STMC) network combines a spatial multi-cue (SMC) module and temporal multi-cue (TMC) to learn spatial and temporal encoding of multi-cue features like full-frame, hands, face, and stance. While it outperforms single-cue algorithms on large SL datasets, it requires more time to pre-process the dataset. Also, annotating key points is required for the complete training of the STMC network. This article presents a spatio-temporal hybrid cue network (STHCN) using dynamic dense spatiotemporal graph convolutional neural network (DDSTGCNN) and VGG11+1D-CNN+BLSTM feature extractor network to address the fore-mentioned issues of ISL recognition and translation tasks. Initially, the skeleton and full frame data is extracted from the input video. The DDSTGCNN is a combination of dense-GCNN (DGCNN) and dynamic spatial-temporal convolution network module (DSTCNM) which learns spatial features temporal features of skeleton data. VGG11+1D-CNN+BLSTM is used to extract the features from full frame data. Then, the extracted features are fed into a bidirectional long-short term memory (BLSTM) encoder, connectionist temporal classification (CTC), and self-attention based LSTM (SA-LSTM) decoders for sequence learning and inference. Finally, The CTC and SA-LSTM predict ISL from input videos and sentences for ISL detection and translation. Finally, an experiment results reveal that the STHCN model achieves 93.65% accuracy in detecting the ISL.