Speaker recognition is a crucial bio-identification technology, which is extensively used in our daily life. With the development of deep learning, convolutional neural networks (CNNs) are applied to speaker recognition tasks given their excellent performance. However, in real life, speaker recognition systems are frequently deployed on end-devices. Therefore, while obtaining recognition accuracy, the model of speaker recognition is expected to be as simple as possible. Inspired by 1-max pooling CNN and Gaussian mixture model-universal background model (GMM-UBM), this study proposes a one dimension convolutional neural networks (1D CNN) on the basis of original 2D CNN. The proposed model reduces the computational complexity of ResNet20 by 64% and the amount of parameters by 53%. In comparison with the original ResNet20 models, the recognition accuracy will be reduced by about one percent on the 15s data set. Then, on the basis of the 1D CNN, we propose a pyramid layer-folding pipeline structure and implement it on the Xilinx VC709 platform. According to the time-dimension partition, the proposed pyramid pipeline structure can process speech data of various lengths. Moreover, our accelerator is 5.1× faster on 3s dataset and 6.8× quicker on 15s dataset than those of the CPU platform.INDEX TERMS Speaker recognition,1D convolution neural networks, pyramid pipeline, folding pipeline, FPGA.