Modeling Event Plausibility with Consistent Conceptual Abstraction

Porada, Ian; Suleman, Kaheer; Trischler, Adam; Cheung, Jackie Chi Kit

doi:10.18653/v1/2021.naacl-main.138

Cited by 8 publications

(9 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A possible solution to overcoming the reporting bias would be to adjust the event distribution via injecting manually elicited knowledge about object and entity properties into models (Wang et al., 2018; although see Porada, Suleman, Trischler, & Cheung, 2021) or via data augmentation (e.g., Zmigrod et al., 2019). Alternatively, information about event typicality might enter LLMs through input from different modalities, such as visual depictions of the world in the form of large databases of images and/or image descriptions (Bisk et al., 2020).…”

Section: Discussionmentioning

confidence: 99%

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

Kauf,

Ivanova,

Rambelli

et al. 2023

Cognitive Science

View full text Add to dashboard Cite

Word co‐occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs’ semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018's BERT to 2023's MPT) assign a higher likelihood to plausible descriptions of agent−patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely versus unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow‐up analyses, we show that (i) LLM scores are driven by both plausibility and surface‐level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.

show abstract

Section: Discussionmentioning

confidence: 99%

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

Kauf,

Ivanova,

Rambelli

et al. 2023

Cognitive Science

View full text Add to dashboard Cite

show abstract

“…Instantiation was attempted by Allaway et al (2023), who proposed a controllable generative framework to probe valid instantiations for abstract knowledge automatically. Though Porada et al (2021) and Peng et al (2022) both proved that existing pretrained language models lack conceptual knowledge, none of existing works explicitly combine both techniques to derive abstract knowledge that is context-sensitive and generalizable.…”

Section: Related Workmentioning

confidence: 99%

HKUST-Landslide Susceptibility Dataset (HKUST-LSD): A benchmark dataset for landslide susceptibility assessment in Hong Kong

Wang

Zhang

Wang

2023

Preprint

View full text Add to dashboard Cite

<p>Rain-induced natural terrain landslides are the most frequent geo-hazard in many regions of the world. As an essential tool in addressing rising landslide challenges due to climate change, landslide susceptibility assessment has been widely investigated in Hong Kong for over twenty years. However, a public dataset for Hong Kong landslide susceptibility assessment is currently absent in the geoscience research community, which brings difficulties in establishing consistent evaluation criteria for testing any new method or theory. Thus, to facilitate the development of new statistical and/or artificial intelligence-based methods for landslides susceptibility assessment, here we compile the first version of The Hong Kong University of Science and Technology &#8211; Landslide Susceptibility Dataset (HKUST-LSD) based on multiple sources of open data. Aiming at comprehensively describing the rain-induced natural terrain landslide conditioning factors in Hong Kong, HKUST-LSD v1.0 comprises data of (a) a landslide inventory; (b) a high-resolution digital terrain model (DTM) and its topographical derivatives; (c) superficial geology, distance to faults and rivers/sea; (d) historical maximum rolling rainfall and (e) ground vegetation condition. HKUST-LSD v1.0 provides a ready-to-use dataset that includes processed landslide and non-landslide samples, together with reference codes that utilized representative machine learning techniques to assess the landslide susceptibility in Hong Kong and achieved satisfactory performance. The dataset will be updated on a regular basis to fulfil the latest research needs that might arise in the research community and support global sustainable development.</p> <p>Download the dataset at: https://github.com/cehjwang/HKUST-LSD</p>

show abstract

“…Does this representation perpetuate the negative stereotype that men are bad at cooking? To investigate this, we should dive deeper into the semantic plausibility learned in language models (Porada et al, 2021;Pedinotti et al, 2021). Unless the focus is on the domain of natural science, there is less agreement on what would lean in spreading desirable and undesirable content, and the borderline can change across time and place.…”

Section: Content Validation For Fair Representationmentioning

confidence: 99%

Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios

Ashida¹,

Sugawara²

2022

Preprint

View full text Add to dashboard Cite

The possible consequences for the same context may vary depending on the situation we refer to. However, current studies in natural language processing do not focus on situated commonsense reasoning under multiple possible scenarios. This study frames this task by asking multiple questions with the same set of possible endings as candidate answers, given a short story text. Our resulting dataset, Possible Stories, consists of more than 4.5K questions over 1.3K story texts in English. We discover that even current strong pretrained language models struggle to answer the questions consistently, highlighting that the highest accuracy in an unsupervised setting (60.2%) is far behind human accuracy (92.5%). Through a comparison with existing datasets, we observe that the questions in our dataset contain minimal annotation artifacts in the answer options. In addition, our dataset includes examples that require counterfactual reasoning, as well as those requiring readers' reactions and fictional information, suggesting that our dataset can serve as a challenging testbed for future studies on situated commonsense reasoning.

show abstract

Modeling Event Plausibility with Consistent Conceptual Abstraction

Cited by 8 publications

References 41 publications

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

HKUST-Landslide Susceptibility Dataset (HKUST-LSD): A benchmark dataset for landslide susceptibility assessment in Hong Kong

Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios

Contact Info

Product

Resources

About