Storytelling is a long-established tradition and listening to stories is still a popular leisure activity. Caused by technization, storytelling media expands, e.g., to social robots acting as multi-modal storytellers, using different multimodal behaviours such as facial expressions or body postures. With the overarching goal to automate robotic storytelling, we have been annotating stories with emotion labels which the robot can use to automatically adapt its behavior. With it, three different approaches are compared in two studies in this paper: 1) manual labels by human annotators (MA), 2) software-based word-sensitive annotation using the Linguistic Inquiry and Word Count program (LIWC), and 3) a machine learning based approach (ML). In an online study showing videos of a storytelling robot, the annotations were validated, with LIWC and MA achieving the best, and ML the worst results. In a laboratory user study, the three versions of the story were compared regarding transportation and cognitive absorption, revealing no significant differences but a positive trend towards MA. On this empirical basis, the Automated Robotic Storyteller was implemented using manual annotations. Future iterations should include other robots and modalities, fewer emotion labels and their probabilities.