This research uses facial expression recognition software (FaceReader) to explore the influence of different sound interventions on the emotions of older people with dementia. The field experiment was carried out in the public activity space of an older adult care facility. Three intervention sound sources were used, namely, music, stream, and birdsong. Data collected through the Self-Assessment Manikin Scale (SAM) were compared with facial expression recognition (FER) data. FaceReader identified differences in the emotional responses of older people with dementia to different sound interventions and revealed changes in facial expressions over time. The facial expression of the participants had significantly higher valence for all three sound interventions than in the intervention without sound (p < 0.01). The indices of sadness, fear, and disgust differed significantly between the different sound interventions. For example, before the start of the birdsong intervention, the disgust index initially increased by 0.06 from 0 s to about 20 s, followed by a linear downward trend, with an average reduction of 0.03 per 20 s. In addition, valence and arousal were significantly lower when the sound intervention began before, rather than concurrently with, the start of the activity (p < 0.01). Moreover, in the birdsong and stream interventions, there were significant differences between intervention days (p < 0.05 or p < 0.01). Furthermore, facial expression valence significantly differed by age and gender. Finally, a comparison of the SAM and FER results showed that, in the music intervention, the valence in the first 80 s helps to predict dominance (r = 0.600) and acoustic comfort (r = 0.545); in the stream sound intervention, the first 40 s helps to predict pleasure (r = 0.770) and acoustic comfort (r = 0.766); for the birdsong intervention, the first 20 s helps to predict dominance (r = 0.824) and arousal (r = 0.891).