The speech and hearing-impaired community use sign language as the primary means of communication. It is quite challenging for the general population to interpret or learn sign language completely. A sign language recognition system must be designed and developed to address this communication barrier. Most current sign language recognition systems rely on wearable sensors, keeping the recognition system unaffordable for most individuals. Moreover, the existing vision-based sign recognition frameworks do not consider all of the spatial and temporal information required for accurate recognition. A novel vison-based hybrid deep neural net methodology is proposed in this study for recognizing Indian and Russian sign gestures. The proposed framework is aimed to establish a single framework for tracking and extracting multisemantic properties, such as non-manual components and manual co-articulations. Furthermore, spatial feature extraction from the sign gestures is deployed using a 3D deep neural net with atrous convolutions. The temporal and sequential feature extraction is carried out by employing attention-based Bi-LSTM. In addition, the distinguished abstract feature extraction is done using the modified autoencoders. The discriminative feature extraction for differentiating the sign gestures from unwanted transition gestures is done by leveraging the hybrid attention module. The experimentation of the proposed model has been carried out on the novel multi-signer Indo-Russian sign language dataset. The proposed sign language recognition framework with hybrid neural net yields better results than other state-of-the-art frameworks.