As an important part of human cultural heritage, many ancient paintings have suffered from various deteriorations that have led to texture blurring, color fading, etc. Single image super-resolution (SISR) which aims to recover a high-resolution (HR) version from a low-resolution (LR) input is actively engaged in the digital preservation of cultural relics. Currently, only traditional superresolution is widely studied and used in cultural heritage, and it is difficult to apply learning-based SISR to unique historical paintings because of the absence of both ground truth and datasets. Fortunately, the recently proposed ZSSR method suggests that it is feasible to generate a small dataset and extract self-supervised information from a single image. However, when applied to the preservations of historical paintings, the performance of ZSSR is highly limited due to the lack of image knowledge. To address the above issues and to unleash the great potential of learning-based methods in heritage conservation, we present Ref-ZSSR, which is the first attempt to combine zero-shot and reference-based methods to achieve SISR. In our model, both global and local multi-scale similar information is fully exploited from the painting itself. In an end-to-end manner, this information provides consistent artistic style image knowledge and helps synthesize SR images with sharp texture details. Compared with the ZSSR method, our approach shows improvement in both precision (approximately 4.68 dB for scale ×2) and visual perception. It is worth mentioning that all image knowledge required in our method can be extracted from the painting itself, i.e., external examples are not required. Therefore, this approach can be easily generalized to any damaged historical paintings, broken murals, noisy old photos, incomplete art works, etc.