The aim of this article is to explain multimodal coherence-making as a transcribing practice and how this can be used to teach multimodal, narrative, and media competences in different genres. In multimodal arrangements, language makes images readable in specific ways and images make language understandable in different ways. This results in an abductive understanding process that can be used in teaching and learning contexts. This idea of meaning-making is based on the social semiotic approach of style. According to the understanding of semiotic meta functions, this approach considers style as the practice of selecting, forming, and composing semiotic resources. These stylistic practices realize a subjective appropriation of discursive and habitual patterns, which are carried out within the semiotic and technological dispositions (affordances) of the situationally used media infrastructures. In this sense, digital storytelling is a multimodal style practice with digital tools. Multimodal storytelling in educational contexts means that teachers and learners are prompted to bring the communicative functions of text, image, video, and audio into narrative coherence. Based on a journalistic Instagram story, this article reconstructs the media-practical, multimodal, and narrative skills that are prototypically necessary. Based on this analysis, these competencies are operationalized to make them usable for new teaching/learning arrangements using digital storytelling.