Multimodal (discourse) analysis is about analyzing and theorizing individual semiotic resources (e.g., language, image, audio, space, gesture, facial expression) and inter-semiotic relations to make meaning of multimodal phenomena in context. Conducting multimodal analyses can be challenging because of its complexity. Computer-based software can support multimodal analysis by automating or semi-automating stages of annotating and analyzing, thus reducing the extensive time and labor required. The designed affordances of the software are aimed at helping researchers establish links between low-level features of multimodal texts and higher-order multimodal semantic meanings by following specific workflows. Amongst many popular multimodal transcription tools (e.g., ELAN, CLAN, ChronoViz), a new open-source multimodal annotation software for video analysis, GRAPE-MARS has been recently launched. This technical review will first describe the organization and affordances of GRAPE-MARS by illustrating a multimodal analysis of a video via this tool. We will then highlight the principal functionalities that make this software efficient in supporting multimodal analysis. Finally, the limitations and future possible applications of GRAPE-MARS will be discussed.