Face super-resolution aims to recover high-resolution face images with accurate geometric structures. Most of the conventional super-resolution methods are trained on paired data that is difficult to obtain in the real-world setting. Besides, these methods do not fully utilize facial prior knowledge for face super-resolution. To tackle these problems, we propose an end-to-end unsupervised face super-resolution network to super-resolve low-resolution face images. We propose a gradient enhancement branch and a semantic guidance mechanism. Specifically, the gradient enhancement branch reconstructs high-resolution gradient maps, under the restriction of two proposed gradient losses. Then the super-resolution network integrates features in both image and gradient space to super-resolve face images with geometric structure preservation. Moreover, the proposed semantic guidance mechanism, including a semantic-adaptive sharpen module and a semantic-guided discriminator, can reconstruct sharp edges and improve local details in different facial regions adaptively, under the guidance of semantic parsing maps. Qualitative and quantitative experiments demonstrate that our proposed method can reconstruct high-resolution face images with sharp edges and photo-realistic details, outperforming the state-of-the-art methods.