AI2D-RST: a multimodal corpus of 1000 primary school science diagrams

Hiippala, Tuomo; Alikhani, Malihe; Haverinen, Jonas; Kalliokoski, Timo; Logacheva, Evanfiya; Orekhova, Serafina; Tuomainen, Aino; Stone, Matthew; Bateman, John A.

doi:10.1007/s10579-020-09517-1

Cited by 20 publications

(19 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This imbalance naturally sets limitations to what kinds of research questions may be pursued using the corpus (cf. also Hiippala et al, 2020). Adopting techniques proposed in digital humanities, such as rapid probing (Kuhn, 2019), in combination with guidance from multimodality theory, may eventually help to rethink the role and nature of multimodal corpora.…”

Section: Discussionmentioning

confidence: 87%

“…While multimodality theory can provide appropriate metadata schemas for distant viewing, applying these schemas to data is time-consuming manual work, which is precisely the same logjam that has so far prevented building large multimodal corpora with multiple layers of annotation. In Hiippala et al (2020), we have recently shown that annotations created by crowd-sourced non-expert workers can be used to increase the size of multimodal corpora. However, the extent to which these annotations can support research on multimodality depends on how well the crowd-sourced annotations capture the characteristics of the modes and media under analysis.…”

Section: Discussionmentioning

confidence: 99%

“…To exemplify, drawing the form of an arrow (→) requires a materiality that provides a two-dimensional canvas. This arrow can be either an illustration of a physical object – an actual arrow – or a diagrammatic element: this depends on whether this particular form is associated with expressive resources that belong to the semiotic of drawing (Riley, 2004) or the diagrammatic mode (Hiippala and Bateman, 2021; Hiippala et al, 2020). This kind of disambiguation is supported by the stratum of discourse semantics, which generates candidate interpretations for instances of expressive resources in their context of occurrence.…”

Section: Applying Multimodality Theory To Distant Viewing Of Photographic Mediamentioning

confidence: 99%

See 2 more Smart Citations

Distant viewing and multimodality theory: Prospects and challenges

Hiippala

2021

Multimodality & Society

Self Cite

View full text Add to dashboard Cite

This article discusses the prospects and challenges of combining multimodality theory with distant viewing, a recent framework proposed in the field of digital humanities. This framework advocates the use of computational methods to enable large-scale analysis of visual and multimodal materials, which must be nevertheless supported by theories that explain how these materials are structured. Multimodality theory is well-positioned to support this effort by providing descriptive schemas that impose structure on the materials under analysis. The field of multimodality research can also benefit from adopting computational methods, which help to achieve the long-term goal of building large multimodal corpora for empirical research. However, despite their immense potential for multimodality research, the use of computational methods warrants caution, because they involve a number of potentially cascading risks that arise from biases inherent to the underlying data and different approaches to the phenomenon of multimodality.

show abstract

Section: Discussionmentioning

confidence: 87%

Section: Discussionmentioning

confidence: 99%

Section: Applying Multimodality Theory To Distant Viewing Of Photographic Mediamentioning

confidence: 99%

See 1 more Smart Citation

Distant viewing and multimodality theory: Prospects and challenges

Hiippala

2021

Multimodality & Society

Self Cite

View full text Add to dashboard Cite

show abstract

“…Otto et al (2019) present an annotated dataset of text and imagery that compares the information load in text and images. However, we build on works that study information-level inferences between discourse units in different modalities such as comic book panels (McCloud, 1993), movie plots (Cumming et al, 2017), and diagrammatic elements (Hiippala et al, 2021). In particular, we use Alikhani et al (2020)'s relations that characterize inferences between text and images.…”

Section: Related Workmentioning

confidence: 99%

COSMic: A Coherence-Aware Generation Metric for Image Descriptions

İnan¹,

Sharma²,

Khalid³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

Self Cite

View full text Add to dashboard Cite

Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and pragmatic success of output text. We address this weakness by introducing the first discourse-aware learned generation metric for evaluating image descriptions. Our approach is inspired by computational theories of discourse for capturing information goals using coherence. We present a dataset of image-description pairs annotated with coherence relations. We then train a coherence-aware metric on a subset of the Conceptual Captions dataset and measure its effectiveness-its ability to predict human ratings of output captions-on a test set composed of out-of-domain images. We demonstrate a higher Kendall Correlation Coefficient for our proposed metric with the human judgments for the results of a number of stateof-the-art coherence-aware caption generation models when compared to several other metrics including recently proposed learned metrics such as BLEURT and BERTScore.

show abstract

“…In this section, we introduce two interrelated diagram corpora, AI2D [17] and AI2D-RST [14], which build on one other, AI2D-RST covering a subset of AI2D.…”

Section: Multimodal Diagram Corporamentioning

confidence: 99%

Introducing the diagrammatic semiotic mode

Hiippala¹,

Bateman²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

In this article, we propose a multimodal perspective to diagrammatic representations by sketching a description of what may be tentatively termed the diagrammatic mode. We consider diagrammatic representations in the light of contemporary multimodality theory and explicate what enables diagrammatic representations to integrate natural language, various forms of graphics, diagrammatic elements such as arrows, lines and other expressive resources into coherent organisations. We illustrate the proposed approach using two recent diagram corpora and show how a multimodal approach supports the empirical analysis of diagrammatic representations, especially in identifying diagrammatic constituents and describing their interrelations.

show abstract

AI2D-RST: a multimodal corpus of 1000 primary school science diagrams

Cited by 20 publications

References 50 publications

Distant viewing and multimodality theory: Prospects and challenges

Distant viewing and multimodality theory: Prospects and challenges

COSMic: A Coherence-Aware Generation Metric for Image Descriptions

Introducing the diagrammatic semiotic mode

Contact Info

Product

Resources

About