Idiom Type Identification with Smoothed Lexical Features and a Maximum Margin Classifier

Salton, Giancarlo D.; Ross, Robert; Kelleher, John D.

doi:10.26615/978-954-452-049-6_083

Cited by 10 publications

(10 citation statements)

References 29 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The literature on the task of automatic type identification of idioms, more specifically verb and noun idiomatic combinations (VNIC), illustrates the use of these standard model evaluation metrics. Most of the work in this field either uses accuracy (used by Fazly et al (2009)) or F-score (used by Muzny and Zettlemoyer (2013), Senaldi et al (2016), Salton et al (2017)) to compare model performance. These measures provide an appreciable sense of the reliability of a given model, which is why they are commonly used as evaluation metrics.…”

Section: Drawbacks Withmentioning

confidence: 99%

See 1 more Smart Citation

Is it worth it? Budget-related evaluation metrics for model selection

Klubička,

Salton,

Kelleher

2018

Preprint

View full text Add to dashboard Cite

Projects that set out to create a linguistic resource often do so by using a machine learning model that pre-annotates or filters the content that goes through to a human annotator, before going into the final version of the resource. However, available budgets are often limited, and the amount of data that is available exceeds the amount of annotation that can be done. Thus, in order to optimize the benefit from the invested human work, we argue that the decision on which predictive model one should employ depends not only on generalized evaluation metrics, such as accuracy and F-score, but also on the gain metric. The rationale is that, the model with the highest F-score may not necessarily have the best separation and sequencing of predicted classes, thus leading to the investment of more time and/or money on annotating false positives, yielding zero improvement of the linguistic resource. We exemplify our point with a case study, using real data from a task of building a verb-noun idiom dictionary. We show that in our scenario, given the choice of three systems with varying F-scores, the system with the highest F-score does not yield the highest profits. In other words, we show that the cost-benefit trade off can be more favorable if a system with a lower F-score is employed.

show abstract

Section: Drawbacks Withmentioning

confidence: 99%

“…For more details on the background of the models, refer toSalton et al (2017). The resulting dataset is published on GitHub as a freely available resource.…”

mentioning

confidence: 99%

Is it worth it? Budget-related evaluation metrics for model selection

Klubička,

Salton,

Kelleher

2018

Preprint

View full text Add to dashboard Cite

show abstract

“…Fazly et al (2009) distinguish between identifying whether an expression has an idiomatic sense (idiom type classification) and identifying whether a particular usage of an expression is idiomatic (idiom token classification), and focus their work on analysing the canonical form (lexical and syntactic) of idiomatic expressions. The related work on idiom token classification at a sentence level includes (Sporleder and Li, 2009;Li and Sporleder, 2010a,b;Peng and Feldman, 2017;Fazly et al, 2009;Salton et al, 2016Salton et al, , 2017. Of particular relevance is Salton et al (2016) which demonstrated that it is possible to train a generic (as distinct to expression specific) idiom token classifier using distributed sentence embeddings.…”

Section: Introductionmentioning

confidence: 99%

Finding BERT’s Idiomatic Key

Nedumpozhimana¹,

Kelleher²

2021

Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

View full text Add to dashboard Cite

Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken from, and what form it takes. Our results indicate that BERT's idiomatic key is primarily found within an idiomatic expression, but also draws on information from the surrounding context. Also, BERT can distinguish between the disruption in a sentence caused by words missing and the incongruity caused by idiomatic usage.

show abstract

“…distinguish between identifying whether an expression has an idiomatic sense (idiom type classification) and identifying whether a particular usage of an expression is idiomatic (idiom token classification), and focus their work on analysing the canonical form (lexical and syntactic) of idiomatic expressions. The related work on idiom token classification at a sentence level includes (Sporleder and Li, 2009;Li and Sporleder, 2010a,b;Peng and Feldman, 2017;Salton et al, , 2017. Of particular relevance is which demonstrated that it is possible to train a generic (as distinct to expression specific) idiom token classifier using distributed sentence embeddings.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

2021

View full text Add to dashboard Cite

In recent years, language models (LMs) have become almost synonymous with NLP. Pre-trained to "read" a large text corpus, such models are useful as both a representation layer as well as a source of world knowledge. But how well do they represent MWEs? This talk will discuss various problems in representing MWEs, and the extent to which LMs address them:• Do LMs capture the implicit relationship between constituents in compositional MWEs (from baby oil through parsley cake to cheeseburger stabbing)?

show abstract

Idiom Type Identification with Smoothed Lexical Features and a Maximum Margin Classifier

Cited by 10 publications

References 29 publications

Is it worth it? Budget-related evaluation metrics for model selection

Is it worth it? Budget-related evaluation metrics for model selection

Finding BERT’s Idiomatic Key

Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

Contact Info

Product

Resources

About