Convergence-aware operator-wise mixed-precision training
Wenhao Dai,
Ziyi Jia,
Yuesi Bai
et al.
Abstract:With the support of more precision formats in emerging hardware architectures, mixed-precision has become a popular approach to accelerate deep learning (DL) training. Applying low-precision formats such as FP16 and BF16 to neural operators can save GPU memory while improving bandwidth. However, DL frameworks use black and white lists as default mixed-precision selections and cannot flexibly adapt to a variety of neural networks. In addition, existing work on automatic precision adjustment does not consider mo… Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.