Medical images offer a non‐invasive method to diagnose different diseases, but using them manually produces unreliable results. Modern deep learning architectures and techniques are computed and data‐intensive, making them difficult to use for relatively smaller datasets of medical images. Transfer learning has been used as a remedy for the problem mentioned above. However, the domain difference between the datasets used for pre‐training (e.g., ImageNet) and the target datasets, like medical images, negatively impacts the transfer learning results. Recently, many researchers have used additional pre‐training called domain‐adaptive pre‐training (DAPT) using the data from the target domain (e.g., medical images) before using the model on the target tasks to achieve superior performance. This study proposes a variant of DAPT by performing it on a subset of the architecture. It has achieved state‐of‐the‐art performance for brain tumor grading on the BraTS 2019 while being computationally efficient.