Medical data refers to health-related information associated with regular patient care or as part of a clinical trial program. There are many categories of such data, such as clinical imaging data, bio-signal data, electronic health records (EHR), and multi-modality medical data. With the development of deep neural networks in the last decade, the emerging pre-training paradigm has become dominant in that it has significantly improved machine learning methods’ performance in a data-limited scenario. In recent years, studies of pre-training in the medical domain have achieved significant progress. To summarize these technology advancements, this work provides a comprehensive survey of recent advances for pre-training on several major types of medical data. In this survey, we summarize a large number of related publications and the existing benchmarking in the medical domain. Especially, the survey briefly describes how some pre-training methods are applied to or developed for medical data. From a data-driven perspective, we examine the extensive use of pre-training in many medical scenarios. Moreover, based on the summary of recent pre-training studies, we identify several challenges in this field to provide insights for future studies.