Technological advances have allowed hand gestures to become an important research field especially in applications such as health care and assisting applications for elderly people, providing a natural interaction with the assisting system through a camera by making specific gestures. In this study, we proposed three different scenarios using a Microsoft Kinect V2 depth sensor then evaluated the effectiveness of the outcomes. The first scenario used joint tracking combined with a depth threshold to enhance hand segmentation and efficiently recognise the number of fingers extended. The second scenario utilised the metadata parameters provided by the Kinect V2 depth sensor, which provided 11 parameters related to the tracked body and gave information about three gestures for each hand. The third scenario used a simple convolutional neural network with joint tracking by depth metadata to recognise and classify five hand gesture categories. In this study, deaf-mute elderly people performed five different hand gestures, each related to a specific request, such as needing water, meal, toilet, help and medicine. Next, the request was sent via the global system for mobile communication (GSM) as a text message to the care provider’s smartphone because the elderly subjects could not execute any activity independently.