Reference annotation datasets containing harmony annotations are at the core of a wide range of studies in music information retrieval (MIR) and related fields. The majority of these datasets contain single reference annotations describing the harmony of each piece. Nevertheless, studies showing differences among annotators in many other MIR tasks make the notion of a single 'ground-truth' reference annotation a tenuous one. In this paper, we introduce and analyse the Chordify Annotator Subjectivity Dataset (CASD) containing chord labels for 50 songs from 4 expert annotators in order to gain a better understanding of the differences between annotators in their chord label choice. Our analysis reveals that annotators use distinct chord-label vocabularies, with low chord-label overlap across all annotators. Between annotators, we find only 73 percent overlap on average for the traditional major-minor vocabulary and 54 percent overlap for the most complex chord labels. A factor analysis reveals the relative importance of triads, sevenths, inversions and other musical factors for each annotator on their choice of chord labels and reported difficulty of the songs. Our results further substantiate the existence of a harmonic 'subjectivity ceiling': an upper bound for evaluations in computational harmony research. Current state-of-the-art chord-estimation systems perform beyond this subjectivity ceiling by about 10 percent. This suggests that current ACE algorithms are powerful enough to tune themselves to particular annotators' idiosyncrasies. Overall, our results show that annotator subjectivity is an important factor in harmonic transcriptions, which should inform future studies into harmony perception and computational models of harmony.