Vein recognition has received increasing attention in recent years. Currently, the deep learning such transformer show robust capacity of feature representation and successfully applied for vein recognition task. Despite of its advances in vein recognition, but it may fail to capture the 2D structure and spatial local information within each patch for image-based vision tasks. In this paper, we propose a novel Convolutional neural network based transformer for vein recognition, termed VeinCnnformer to take advantage of convolutional operations and self-attention mechanisms for improving representation learning. Our VeinCnnformer roots in the Mixed Convolutional Attention Module (MCAM), which fuses local features and global representations. Specifically, we propose three modules, convolutional neural network(CNN) module, convolutional channel attention module, and transformer module. Secondly, the convolutional channel attention module and CNN module are combined with transformer module to form the MCAM. Thirdly, we stack multiple MCAM to obtain our VeinCnnformer for vein recognition. Finally, the experimental results on three public vein databases show that our VeinCnnformer outperforms the existing vein recognition approaches and achieves the state of the art recognition results.