We have utilized single-pixel imaging and deep-learning to solve the privacy-preserving problem in gesture recognition for interactive display. Silhouette images of hand gestures were acquired by use of a display panel as an illumination. Reconstructions of gesture images have been performed by numerical experiments on single-pixel imaging by changing the number of illumination mask patterns. For the training and the image restoration with deep learning, we prepared reconstructed data with 250 and 500 illuminations as datasets. For each of the 250 and 500 illuminations, we prepared 9000 datasets in which original images and reconstructed data were paired. Of these data, 8500 data were used for training a neural network (6800 data for training and 1700 data for validation), and 500 data were used to evaluate the accuracy of image restoration. Our neural network, based on U-net, was able to restore images close to the original images even from reconstructed data with greatly reduced number of illuminations, which is 1/40 of the single-pixel imaging without deep learning. Compared restoration accuracy between cases using shadowgraph (black on white background) and negative-positive reversed images (white on black background) as silhouette image, the accuracy of the restored image was lower for negative-positive-reversed images when the number of illuminations was small. Moreover, we found that the restoration accuracy decreased in the order of rock, scissor, and paper. Shadowgraph is suitable for gesture silhouette, and it is necessary to prepare training data and construct neural networks, to avoid the restoration accuracy between gestures when further reducing the number of illuminations.