Two-tier multiple-choice (TTMC) items are used to assess students' knowledge of a scientific concept for tier 1 and their reasoning about this concept for tier 2. But are the knowledge and reasoning involved in these tiers really distinguishable? Are the tiers equally challenging for students? The answers to these questions influence how we use and interpret TTMC instruments. We apply the Rasch measurement model on TTMC items to see if the items are distinguishable according to different traits (represented by the tier), or according to different content sub-topics within the instrument, or to both content and tier. Two TTMC data sets are analyzed: data from Singapore and Korea on the Light Propagation Diagnostic Instrument (LPDI), data from the United States on the Classroom Test of Scientific Reasoning (CTSR). Findings for LPDI show that tier-2 reasoning items are more difficult than tier-1 knowledge items, across content sub-topics. Findings for CTSR do not show a consistent pattern by tier or by content sub-topic. We conclude that TTMC items cannot be assumed to have a consistent pattern of difficulty by tier-and that assessment developers and users need to consider how the tiers operate when administering TTMC items and interpreting results. Researchers must check the tiers' difficulties empirically during validation and use. Though findings from data in Asian contexts were more consistent, further study is needed to rule out differences between the LPDI and CTSR instruments.Keywords: Science education, Two-tier items, Rasch measurement models, Optics, Scientific reasoning Assessing student learning-of scientific concepts, practices, or habits of mind-is one of the central topics for research and development in science education. Such assessments can serve as formative or diagnostic tools for planning instruction and working with students, or as summative tools for gauging the effectiveness of our instructional practices, curriculum materials, or teacher education efforts. However, we observe an ongoing tension in science education assessment between our ability to construct conventional test items (e.g., multiple choice questions) that can be highly reliable but are perceived to be incapable of providing richer insights into students' conceptions and ways of thinking. Research addressing this includes efforts to make better sense of how students' responses to test items can be understood from a broader view of conceptual understanding, ability, or skill Neumann et al. 2011).One solution proposed that can address this tension are two-tier items (Treagust 1988). The two tiers in two-tier items act together to uncover students' understanding of core concepts because the student must choose a seemingly "factual" knowledge response for the first tier (Taber and Tan 2011), and then choose for the second tier what reasoning about the concept they used to arrive at the first-tier response. A large body of research across contexts has applied two-tier items to uncover students' understanding of scientific ...