Introduction: This study investigated the influence of stimulus composition for three speech intelligibility word lists and two scoring methods on the speech accuracy judgments of five tracheoesophageal (TE) speakers. This was achieved through phonemic comparisons across TE speakers’ productions of stimuli from the three intelligibility word lists, including the (1) Consonant Rhyme Test, (2) Northwestern Intelligibility Test, and (3) the Weiss and Basili list. Methodology: Fifteen normal-hearing young adults served as listeners; all listeners were trained in phonetic transcription (IPA), but none had previous exposure to any mode of postlaryngectomy alaryngeal speech. Speaker stimuli were presented to all listeners through headphones, and all stimuli were transcribed phonetically using an open-set response paradigm. Data were analyzed for individual speakers by stimulus list. Phonemic scoring was compared to a whole-word scoring method, and the types of errors observed were quantified by word list. Results: Individual speaker variability was noted, and its effect on the assessment of speech accuracy was identified. The phonemic scoring method was found to be a more sensitive measure of TE speech accuracy. The W&B list yielded the lowest accuracy scores of the three lists. This finding may indicate its increased sensitivity and potential clinical value. Conclusions: Overall, this study supports the use of open-set, phonemic scoring methods when evaluating TE speaker intelligibility. Future research should aim to assess the specificity of assessment tools on a larger sample of TE speakers who vary in their speech proficiency.