With the support of more precision formats in emerging hardware architectures, mixed-precision has become a popular approach to accelerate deep learning (DL) training. Applying low-precision formats such as FP16 and BF16 to neural operators can save GPU memory while improving bandwidth. However, DL frameworks use black and white lists as default mixed-precision selections and cannot flexibly adapt to a variety of neural networks. In addition, existing work on automatic precision adjustment does not consider model convergence, and the decision cost of precision selection is high. To address the above problems, this paper proposes CoMP, a non-intrusive framework for Convergence-aware operator-wise Mixed-precision training. CoMP uses two-stage precision adjustment based on epochs and batches to ensure convergence and performance respectively. After that, CoMP performs subsequent training according to the searched optimal operator-wise mixed-precision plan. The experimental results on A100 GPU show that CoMP achieves a maximum performance speedup of 1.15$$\times$$
×
compared with PyTorch AMP implementation, while also saving up to 29.81% of GPU memory.