Conceptual flaws can undermine even rigorous test development efforts, especially in the broad empathy and social cognition domains, which are characterized by measure proliferation and inconsistently used construct terms. We discuss these issues, focusing on a new instrument of “mentalizing” as a case study. Across several studies, Clutterbuck et al. (2021a) developed the Four-Item Mentalising Index (FIMI). They described it as the first self-report measure of mentalizing ability and suggested that it offers substantial advances for research and assessment. As we demonstrate with conceptual arguments and empirical data, the FIMI embodies several major problems that are common in this area of research. Using the FIMI as a case study, we underline the importance for test developers of attending to the nonnegotiable necessity of discriminant validity analyses, the challenge of choosing appropriate convergent validity measures, and the difficulties of navigating the jingle-jangle jungle of empathy and social cognition construct terms.