Modelling languages play a central role in developing complex, critical systems. A precise, comprehensible, and high-quality modelling language specification is essential to all stakeholders using, implementing, or extending the language. Many good practices can be found that improve the understandability or consistency of the languages’ semantics. However, designing a modelling language intended for a large audience is still challenging. In this paper, we investigate the challenges and typical issues with assessing the specifications of behavioural modelling language semantics. Our key insight is that the various stakeholder’s understandings of the language’s semantics are often misaligned, and the semantics defined in various artefacts (simulators, test suites) are inconsistent. Therefore assessment of semantics should focus on identifying and resolving these inconsistencies. To illustrate these challenges and techniques, we assessed parts of a state-of-the-art specification for a general-purpose modelling language, the Precise Semantics of UML State Machines (PSSM). We reviewed the text of the specification, analysed and executed PSSM’s conformance test suite, and categorised our experiences according to questions generally relevant to modelling languages. Finally, we made recommendations for improving the development of future modelling languages by representing the semantic domain and traces more explicitly, applying diverse test design techniques to obtain conformance test suites, and using various tools to support early-phase language design.