This article discusses the prospects and challenges of combining multimodality theory with distant viewing, a recent framework proposed in the field of digital humanities. This framework advocates the use of computational methods to enable large-scale analysis of visual and multimodal materials, which must be nevertheless supported by theories that explain how these materials are structured. Multimodality theory is well-positioned to support this effort by providing descriptive schemas that impose structure on the materials under analysis. The field of multimodality research can also benefit from adopting computational methods, which help to achieve the long-term goal of building large multimodal corpora for empirical research. However, despite their immense potential for multimodality research, the use of computational methods warrants caution, because they involve a number of potentially cascading risks that arise from biases inherent to the underlying data and different approaches to the phenomenon of multimodality.