Communication has been a vital point of human civilization and technological advancement. Hand gesture recognition is a form of communication through which a machine can understand and extract information from hand sign images or video frames. It may act as a powerful interface between deaf-mute persons and the world around them by enabling communication with other human beings and computers without conventional input devices. However, development of such human-computer interaction (HCI) systems, specifically for deaf-mute communities, is a non-trivial research problem. Many studies and research works in this direction have been reported in literature. A few articles on survey or review of the reported methods are also available [1][2][3][4]. Hand gesture recognition, as evident in literature, may be broadly categorized into two groups, (i) sensor-based methods [5-9] and (ii) vision-based methods [10][11][12][13][14][15][16][17]. The sensor-based methods employ different kind of sensors to acquire hand posture images or videos. Depth sensor is the most popular among them. Using depth sensors such as Microsoft Kinect, Asus Xtion Pro, Intel Real Sense, depth information for object pixels at various locations is acquired, which is often useful in segmenting the region of interest (ROI), i.e., the hand portion containing the palm and the fingers from the acquired images containing surroundings as well. However, such sensors are not popular and economic, and hence, its usability is limited in comparison to vision-based systems which work with RGB images that are pervasively used at population scale.Vision-based gesture recognition has received significant attention of researchers for wide availability and usage of digital camera since the last two decades. Even cellphones, nowadays, have built-in high-resolution digital cameras which acquire RGB images. Moreover, web-cam integrated with laptop or attached with desktop