2022
DOI: 10.1162/tacl_a_00510
|View full text |Cite
|
Sign up to set email alerts
|

Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions

Abstract: Idiomatic expressions (IEs), characterized by their non-compositionality, are an important part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today’s state-of-the-art. Prior work has identified deficiencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(17 citation statements)
references
References 44 publications
0
17
0
Order By: Relevance
“…One line of evidence questioning this ability comes from patterns of similarity between noncompositional expressions. Zeng and Bhat (2022) extract mean-pooled idiom embeddings from BART and find that they cluster together based on surface or syntactic similarity rather than figurative meaning. Garcia et al (2021b) compare contextualized embeddings of compounds and their synonyms.…”
Section: Off-the-shelf Representationsmentioning
confidence: 99%
See 1 more Smart Citation
“…One line of evidence questioning this ability comes from patterns of similarity between noncompositional expressions. Zeng and Bhat (2022) extract mean-pooled idiom embeddings from BART and find that they cluster together based on surface or syntactic similarity rather than figurative meaning. Garcia et al (2021b) compare contextualized embeddings of compounds and their synonyms.…”
Section: Off-the-shelf Representationsmentioning
confidence: 99%
“…A computationally leaner approach consists in learning an adapter, as shown by Zeng and Bhat (2022) on BART idiom embeddings. They evaluate different adapters, with learning objectives that include reconstructing corrupted idiomatic sentences and increasing the similarity between the embeddings of idioms and their dictionary definitions.…”
Section: Optimized Representationsmentioning
confidence: 99%
“…Ideally, their representations ought to be distinct in these two contexts. However, examining the representation of 235 PIEs that are largely unrelated in their literal and idiomatic context (their literal PIE embeddings and idiomatic definitions have a mean cosine similarity of 0.0047), we notice that their representations generated by the state-ofthe-art (Zeng and Bhat, 2022) exhibit a high cosine similarity between their idiomatic and literal PIE embeddings (mean cosine similarity of 0.82).…”
Section: Introductionmentioning
confidence: 98%
“…Modern NLP systems, however, are primarily driven by the notion of compositionality, which is at the core of several system components, including tokenization (Sennrich et al, 2016;Wu et al, 2016) and the self-attention mechanism (Vaswani et al, 2017). More fundamentally, recent studies (Zeng and Bhat, 2022) reveal that the pre-trained language models (PTLMs), such as GPT-3 (Brown et al, 2020) and BART (Lewis et al, 2020), are ill-equipped to represent (and comprehend) idiomatic expressions' (IE) meanings. This is demonstrated by the lack of correspondence between the IE meanings and their embeddings; IEs with similar meanings are not close in the embedding space.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation