A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs

Xu, Jinchen; Song, Guanghui; Zhou, Bei; Li, Fei; Hao, Jiangwei; Zhao, Jie

doi:10.1145/3627535.3638484

Cited by 3 publications

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Convergence-aware operator-wise mixed-precision training

Dai,

Jia,

Bai

et al. 2024

CCF Trans. HPC

View full text Add to dashboard Cite

With the support of more precision formats in emerging hardware architectures, mixed-precision has become a popular approach to accelerate deep learning (DL) training. Applying low-precision formats such as FP16 and BF16 to neural operators can save GPU memory while improving bandwidth. However, DL frameworks use black and white lists as default mixed-precision selections and cannot flexibly adapt to a variety of neural networks. In addition, existing work on automatic precision adjustment does not consider model convergence, and the decision cost of precision selection is high. To address the above problems, this paper proposes CoMP, a non-intrusive framework for Convergence-aware operator-wise Mixed-precision training. CoMP uses two-stage precision adjustment based on epochs and batches to ensure convergence and performance respectively. After that, CoMP performs subsequent training according to the searched optimal operator-wise mixed-precision plan. The experimental results on A100 GPU show that CoMP achieves a maximum performance speedup of 1.15$$\times$$ × compared with PyTorch AMP implementation, while also saving up to 29.81% of GPU memory.

show abstract