Gender differential item functioning (DIF) is essential to address if fair and valid test score interpretations are desired. Despite inequivalence having been addressed previously, there are no solid conclusions about the causes provoking this bias nor indications to reduce its presence. The objective of this mixed studies systematic review is to describe how measurement invariance has been addressed when studying the gender gap in educational assessments. We searched for quantitative, qualitative, or mixed-methods studies that tested measurement invariance/DIF and/or applied qualitative methods to explore the causes of the gender gap in educational assessments with adolescents. We used the Quality Assessment Tool (QATSDD; Sirriyeh et al., 2012 ) to assess the risk of bias, and proposed a results-based convergent synthesis design. We included 87 studies, with 3,458,853 adolescent participants. Multigroup Confirmatory Factor Analysis (CFA) and Mantel-Haenszel were the most used strategies to test measurement invariance/detect DIF. Certain methods, such as latent class analysis (LCA), mixture item response theory model (MMixIRTM), or Simultaneous Item Bias Test (SIBTEST) were most used by studies that examined sources of DIF. The most used qualitative strategy to examine sources of DIF was content analysis. Limitations due to methodological concerns and missing data are discussed. We provide an important description of invariance testing/DIF detection methods that can serve as a guide to future researchers interested in sources of gender DIF.