Within‐project defect prediction assumes that we have sufficient labeled data from the same project, while cross‐project defect prediction assumes that we have plenty of labeled data from source projects. However, in practice, we might only have limited labeled data from both the source and target projects in some scenarios. In this paper, we want to apply multitask learning to investigate such a new scenario. To our best knowledge, this problem (ie, both the source project and the target project have limited labeled data) has not been thoroughly investigated, and we are the first to propose a novel multitask defect prediction approach mask. mask consists of a differential evolution optimization phase and a multitask learning phase. The former phase aims to find optimal weights for shared and nonshared information in related projects (ie, the target project and its related source projects), while the latter phase builds prediction models for each project simultaneously. To verify the effectiveness of mask, we perform experimental studies on 18 real‐world software projects and compare our approach with four state‐of‐the‐art baseline approaches: single‐task learning (STL), simple combined learning (SCL), Peters filter, and Burak filter. Experimental results show that mask can achieve F1 of 0.397 and AUC of 0.608 on average with a few labeled data (ie, 10% of data). Across the 18 projects, mask can outperform baseline methods significantly in terms of F1 and AUC. Therefore, by utilizing the relatedness among multiple projects, mask can perform significantly better than the state‐of‐the‐art methods. The results confirm that mask is promising for software defect prediction when the source and target projects both have limited training data.