“…These previous works use devices such as RGB cameras [5,21,27,29,33,35,46,54,55], motion sensors (e.g., Leap Motion) [14,41], depth cameras/sensors (e.g., Kinect) [6,10,11,16,38,48,51], or electromyogram (EMG) sensors [53,57] to capture user hand motions and combine sensing results with various machine learning models to infer the word being expressed. More recently, research has considered the contextual meanings of words and their syntaxial relationships to generate proper sentences from sign language motions [13,14,21]. However, it is not trivial to apply these technologies in everyday situations since they either require additional devices or infrastructure support.…”