Manga with their distinct style and symbolism represent a growing reading trend in the world. Manga use an established set of symbols to convey various emotions. Manga have generally been more experimental in layout than Western comics. They are more fragmentary and contain more panels that enhance the dynamism of the story. We aimed to outline methodological approaches to the analysis of manga; to summarize specific features of manga as a separate medium; to analyse how multimodal cohesion is created in manga; to reveal various types of relations between visual and verbal modes. Manga is a multimodal discourse, combining several modes, mainly visual and verbal. The aural mode is represented by linguistic and visual signs, e.g. jagged borders of a speech bubble or the size and boldness of letters. We applied methods originally designed for the film analysis to the analysis of manga, in particular, Tseng’s (2013) theory of cross-modal cohesion, based on tracking cross-modally realized characters, objects, actions, and settings. This analysis included building cross-modal cohesive chains. We argue that it is possible to track cross-modal cohesion in manga, based on the interaction of visual, verbal, and aural components of identity chains. Besides, the interaction between visual and verbal modes was revealed by analysing text-image relations. In this paper we have outlined manga-specific features, distinctive features of manga’s page layout, cinematic devices, which manga borrowed from films, some of which may be used as focalisation-marking devices.