“…Random forest-based methods [21,23,39,[41][42][43]48] provide fast and accurate performance. However, they utilize hand-crafted features and are overcome by recent CNN-based approaches [1,3,4,6,7,10,11,14,15,24,29,30,37,45,50,51] that can learn useful features by themselves. Tompson et al [45] firstly utilized CNN to localize hand keypoints by estimating 2D heatmaps for each hand joint.…”