In recent years, Convolutional Neural Networks (CNN) have received widespread attention in the field of machine learning due to their high-accuracy performance in character recognition and image classification. Nevertheless, the compute-intensive and memory-intensive characteristics of CNN have posed huge challenges to the general-purpose processor, which needs to support various workloads. Therefore, a large number of CNN-specific hardware accelerators have emerged to improve efficiency. Though significantly efficient, previous accelerators are not flexible enough. In this study, classical CNN models are analyzed, and a domain-specific instruction set of 10 matrix instructions, called RV-CNN, is designed based on the promising RISC-V architecture. By abstracting CNN computation into instructions, the proposed design can provide sufficient flexibility for CNN and possesses a higher code density than the general ISA. On this basis, a code-to-instruction mapping mechanism is proposed. By using the RV-CNN to build different CNN models on the Xilinx ZC702, this paper found that compared to x86 processors, RV-CNN has on average 141 times the energy efficiency and 8.91 times the code density; compared to GPU, it has on average 1.25 times the energy efficiency and 1.95 times the code density. In addition, compared to previous CNN accelerators, the design supports typical CNN models while at high energy efficiency.