Lyric transcription is similar to speech recognition, both identify content from sound clips. Speech recognition technology is maturing and related application systems have been widely used in the software industry, but the research on singing content is far from getting enough attention, there is still little research on identifying words and sentences from singing voice. What's more serious is that compared with the lyrics transcription in the English field, there are almost no related academic papers in the Mandarin field. On the one hand, speech recognition has high-quality datasets in multiple languages that are large enough to train large-scale models. However, the field of singing lacks data resources. On the other hand, compared with speech recognition, singing recognition has obvious skills in pronunciation, which is embodied in musical characteristics such as pitch and rhythm. Based on these problems, this paper aims to provide a dataset that can be used for Mandarin lyrics transcription, and build a transcription model on this dataset. Our model can address some deficiencies of the existing models, and achieves promising results on our dataset.