Diversity and inclusion has been a concern for the physics community for nearly 50 years. Despite significant efforts including the American Physical Society (APS) Conferences for Undergraduate Women in Physics (CUWiP) and the APS Bridge Program, women, African Americans, and Hispanics continue to be substantially underrepresented in the physics profession. Similar efforts within the field of engineering, whose students make up the majority of students in the introductory calculus-based physics courses, have also met with limited success. With the introduction of research-based instruments such as the Force Concept Inventory (FCI), the Force and Motion Conceptual Evaluation (FMCE), and the Conceptual Survey of Electricity and Magnetism (CSEM), differences in performance by gender began to be reported. Researchers have yet to come to an agreement as to why these "gender gaps" exist in the conceptual inventories that are widely used in physics education research and/or how to reduce the gaps. The "gender gap" has been extensively studied; on average, for the mechanics conceptual inventories, male students outperform female students by 13% on the pretest and by 12% post instruction. While much of the gender gap research has been geared toward the mechanics conceptual inventories, there have been few studies exploring the gender gap in the electricity and magnetism conceptual inventories. Overall, male students outperform female students by 3.7% on the pretest and 8.5% on the post-test; however, these studies have much more variation including one study showing female students outperforming male students on the CSEM. Many factors have been proposed that may influence the gender gap, from differences in background and preparation to various psychological and sociocultural effects. A parallel but largely disconnected set of research has identified gender biased questions within the FCI. This research has produced sporadic results and has only been performed on the FCI. The work performed in this manuscript will seek to synthesize these strands and use large datasets and deep demographic data to understand the persistent differences in male and female performance.