This article describes a new expert-labelled dataset featuring harmonic, phrase, and cadence analyses of all piano sonatas by W.A. Mozart. The dataset draws on the DCML standard for harmonic annotation and is being published adopting the FAIR principles of Open Science. The annotations have been verified using a data triangulation procedure which is presented as an alternative approach to handling annotator subjectivity. This procedure is suited for ensuring consistency, within the dataset and beyond, despite the high level of analytical detail afforded by the employed harmonic annotation syntax. The harmony labels also encode contextual information and are therefore suited for investigating music theoretical questions related to tonal harmony and the harmonic makeup of cadences in the classical style. Apart from providing basic statistical analyses characterizing the dataset, its music theoretical potential is illustrated by two preliminary experiments, one on the terminal harmonies of cadences and the other on the relation between performance durations and harmonic density. Furthermore, particular features can be selected to produce more coarse-grained training data, for example for chord detection algorithms that require less analytical detail. Facilitating the dataset's reusability, it comes with a Python script that allows researchers to easily access various representations of the data tailored to their particular needs.
Digital Musicology is a vibrant and quickly growing discipline that addresses traditional and novel music-related research questions with digital and computational means (Honing, 2006;Huron, 1999;Urberg, 2017). Research questions and methods often overlap with or draw on those from diverse disciplines such as music theory and analysis, composition history, mathematics, cognitive psychology, linguistics, anthropology, or computer science (Volk & Honingh, 2012;Wiggins, 2012). Corpus research, i.e., the computational study of representative collections of texts (in the case of linguistics) or notated music (in musicology), plays a prominent role in this trans-disciplinary quest to "make sense of music" through scientific models (London, 2013;Moss, 2019;Shanahan, 2022). ms3 makes scores (symbolic representations of music) operational for computational approaches by representing their contents as sets of tabular files.
This paper discusses the potential of musical corpus studies, taking research on common-practice tonal harmony as a case in point. Based on a brief depiction of a project carried out at the École polytechnique fédérale de Lausanne (EPFL) and two novel datasets of harmonic analyses of music in the classical style, we elaborate on research questions, applications, and the need of extending the annotation standard used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.