SemBleu: A Robust Metric for AMR Parsing Evaluation

Song, Linfeng; Gildea, Daniel

doi:10.18653/v1/p19-1446

Cited by 17 publications

(20 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The metric SEMBLEU (Song and Gildea, 2019) is most closely related to ours. It evaluates AMR graphs by calculating precision based on n-gram overlap.…”

Section: Related Workmentioning

confidence: 63%

Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing

Liu

Cohen

Lapata

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Discourse representation structures (DRSs) are scoped semantic representations for texts of arbitrary length. Evaluation of the accuracy of predicted DRSs plays a key role in developing semantic parsers and improving their performance. DRSs are typically visualized as nested boxes, in a way that is not straightforward to process automatically. COUNTER, an evaluation algorithm for DRSs, transforms them to clauses and measures clause overlap by searching for variable mappings between two DRSs. Unfortunately, COUNTER is computationally costly (with respect to memory and CPU time) and does not scale with longer texts. We introduce DSCORER, an efficient new metric which converts box-style DRSs to graphs and then measures the overlap of n-grams in the graphs. Experiments show that DSCORER computes accuracy scores that correlate with scores from COUNTER at a fraction of the time.

show abstract

“…The metric SEMBLEU (Song and Gildea, 2019) is most closely related to ours. It evaluates AMR graphs by calculating precision based on n-gram overlap.…”

Section: Related Workmentioning

confidence: 63%

Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing

Liu

Cohen

Lapata

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…a graph-based encoding of the Discourse Representation Structures of Basile et al (2012). Further, we plan on refining and extending the available training data (in particular for UCCA) and will put greater focus on the systematic exploration of variant evaluation perspectives, for example scoring at the level of larger sub-graphs in the spirit of the 'complete predications' metric of , or 'semantic n-grams' along the lines of the SemBleu proposal by Song and Gildea (2019). Aiming for increased linguistic diversity, it will of course also be tempting to seek to include meaning representations for additional languages.…”

Section: Reflections and Outlookmentioning

confidence: 99%

MRP 2019: Cross-Framework Meaning Representation Parsing

Oepen¹,

Abend²,

Hajič³

et al. 2019

Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Lea

View full text Add to dashboard Cite

The 2019 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks. Five distinct approaches to the representation of sentence meaning in the form of directed graphs were represented in the training and evaluation data for the task, packaged in a uniform graph abstraction and serialization. The task received submissions from eighteen teams, of which five do not participate in the official ranking because they arrived after the closing deadline, made use of extra training data, or involved one of the task co-organizers. All technical information regarding the task, including system submissions, official results,

show abstract

“…Simplify and match -SEMBLEU The SEMBLEU metric in Song and Gildea (2019) can also be described as a two-step procedure. But unlike SMATCH it operates on a variable-free reduction of an AMR graph G, which we denote by G vf (vf : variable-free, Figure 1, right-hand side).…”

Section: Amr Metrics: Smatch and Sembleumentioning

confidence: 99%

“…Its backbone is an alignment-search be-tween the graphs' variables. Recently, the SEMBLEU metric (Song and Gildea, 2019) has been proposed that operates on the basis of a variable-free AMR (Figure 1, right), 1 converting it to a bag of k-grams. Circumventing a variable alignment search reduces computational cost and ensures full determinacy.…”

Section: Introductionmentioning

confidence: 99%

AMR Similarity Metrics from Principles

Opitz

Pârcălăbescu

Frank

2020

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Different metrics have been proposed to compare Abstract Meaning Representation (AMR) graphs. The canonical Smatch metric (Cai and Knight, 2013 ) aligns the variables of two graphs and assesses triple matches. The recent SemBleu metric (Song and Gildea, 2019 ) is based on the machine-translation metric Bleu (Papineni et al., 2002 ) and increases computational efficiency by ablating the variable-alignment. In this paper, i) we establish criteria that enable researchers to perform a principled assessment of metrics comparing meaning representations like AMR; ii) we undertake a thorough analysis of Smatch and SemBleu where we show that the latter exhibits some undesirable properties. For example, it does not conform to the identity of indiscernibles rule and introduces biases that are hard to control; and iii) we propose a novel metric S2 match that is more benevolent to only very slight meaning deviations and targets the fulfilment of all established criteria. We assess its suitability and show its advantages over Smatch and SemBleu.

show abstract

SemBleu: A Robust Metric for AMR Parsing Evaluation

Cited by 17 publications

References 17 publications

Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing

Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing

MRP 2019: Cross-Framework Meaning Representation Parsing

AMR Similarity Metrics from Principles

Contact Info

Product

Resources

About