Although the existing cardiac diffusion tensor imaging (DTI) denoising methods have achieved promising results, most of them are dependent on the number of diffusion gradient directions, noise distributions, and noise levels. To address these issues, we propose a novel self-supervised cardiac DTI denoising network, Node2Node, which firstly expresses the diffusion-weighted (DW) image volumes along different directions as a graph, then the graph framelet transform (GFT) is implemented to map the DW signals into the GFT coefficients at different spectral bands, allowing us to accurately match the DW image pairs. After that, using the matched image pairs as input and target, a ResNet-like network is used to denoise in a self-supervised manner. In addition, a novel edge-aware loss based on pooling operation is proposed to retain the edge. Through comparison with several state-of-the-art methods on synthetic, ex vivo porcine, and in vivo human cardiac DTI datasets, we showed that the root mean square error (RMSE) of DW images and the average angular error (AAE) of fiber orientations obtained using Node2Node are the smallest, improved by 47.5% and 23.7%, respectively, on the synthetic dataset, demonstrating that Node2Node is not sensitive to the properties of the dataset.