Detection and tracking are the vital stages to form the gesture trajectory in gesture recognition. It becomes more challenging when the variations in illumination, pose, position, occlusion, scale, speed, blurring effect and complex environment are introduced. Additionally, the background feature domination effect affects the existing deep learning models. A semantic segmentation model is implemented in this work to detect the bare hand to overcome these challenges. A pre‐trained network VGG‐16 is utilized by training with the proposed NITS S‐Net database. Evaluation of the SegNet model is done on EgoHands, Oxford and OUHands databases. To track the bare hand, a SegNet‐based detection and tracking approach is proposed using Kalman filter and point‐tracker. This model achieves 97.01% accuracy (a relative improvement of ~8% from the baseline models) at 0.068 s per frame computational time on NITS hand gesture database VIIIB. The gesticulated characters, that is, alphabets, numbers, operators, special characters, are gesticulated without any constraints on the pattern/strokes. To recognize these 95 multi‐stroke gestures, a deep convolutional neural network (DCNN) is presented using AlexNet. The DCNN model achieves 97.60% (a relative improvement of ~14% from the baseline models) accuracy on the NITS hand gesture database VIIIB merged. Evaluation of the handwritten EMNIST merged (balanced) database resulted in average recognition accuracy of 91.60%.