End-stage kidney disease (ESKD) presents a significant public health challenge, with hemodialysis (HD) remaining one of the most prevalent kidney replacement therapies. Ensuring the longevity and functionality of arteriovenous accesses is challenging for HD patients. Blood flow sound, which contains valuable information, has often been neglected in the past. However, machine learning offers a new approach, leveraging data non-invasively and learning autonomously to match the experience of healthcare professionas. This study aimed to devise a model for detecting arteriovenous grafts (AVGs) stenosis. A smartphone stethoscope was used to record the sound of AVG blood flow at the arterial and venous sides, with each recording lasting one minute. The sound recordings were transformed into mel spectrograms, and a 14-layer convolutional neural network (CNN) was employed to detect stenosis. The CNN comprised six convolution blocks with 3x3 kernel mapping, batch normalization, and rectified linear unit activation function. We applied contrastive learning to train the pre-training audio neural networks model with unlabeled data through self-supervised learning, followed by fine-tuning. In total, 27,406 dialysis session blood flow sounds were documented, including 180 stenosis blood flow sounds. Our proposed framework demonstrated a significant improvement (p<0.05) over training from scratch and a popular pre-trained audio neural networks (PANNs) model, achieving an accuracy of 0.9279, precision of 0.8462, and recall of 0.8077, compared to previous values of 0.8649, 0.7391, and 0.6538. This study illustrates how contrastive learning with unlabeled blood flow sound data can enhance convolutional neural networks for detecting AVG stenosis in HD patients.