Chinese Spell Check (CSC) aims to detect and correct spelling errors in Chinese text, almost all of which are related to phonetic or visual similarity. Large-scale pre-trained models (PLMs) are currently making substantial progress on the CSC task. However, when correcting errors, PLMs tend to select those words that are semantically sound or expressively fluent, sometimes ignoring pronunciation similarities. Meanwhile, the models lack knowledge of pronunciation differences. To address this problem, we propose a multi-task learning model to help enhance the CSC task. The auxiliary task is to estimate the degree of pronunciation gap between the original input and the corresponding correct text from the granularity of each word. Specifically, we use the edit distance of Pinyin to measure the degree of pronunciation discrepancy. The edit distance scheme we use is modified, due to the specificity of the Pinyin structure. Experiments on a open available benchmark dataset demonstrate the effectiveness of our strategy.