Sign language recognition (SLR) is a challenging, but highly important research field for several computer vision systems that attempt to facilitate the communication among the deaf and hearing impaired people. In this work, we propose an accurate and robust deep learning-based methodology for sign language recognition from video sequences. Our novel method relies on hand and body skeletal features extracted from RGB videos and, therefore, it acquires highly discriminative for gesture recognition skeletal data without the need for any additional equipment, such as data gloves, that may restrict signer's movements. Experimentation on a large publicly available sign language dataset reveals the superiority of our methodology with respect to other state of the art approaches relying solely on RGB features.