Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.480
|View full text |Cite
|
Sign up to set email alerts
|

Implicit Discourse Relation Classification: We Need to Talk about Evaluation

Abstract: Implicit relation classification on Penn Discourse TreeBank (PDTB) 2.0 is a common benchmark task for evaluating the understanding of discourse relations. However, the lack of consistency in preprocessing and evaluation poses challenges to fair comparison of results in the literature. In this work, we highlight these inconsistencies and propose an improved evaluation protocol. Paired with this protocol, we report strong baseline results from pretrained sentence encoders, which set the new state-of-the-art for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
33
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(36 citation statements)
references
References 24 publications
3
33
0
Order By: Relevance
“…Table 4 shows the performance of our model on PDTB-Ji at the top-level and second-level classes. Consistent with Kim et al (2020)'s conclusion, BERT is indeed the best baseline, achieves 51.88 and 36.10 in Micro-F1 and Macro-F1 at the secondlevel and 63.91 and 55.13 in Micro-F1 and Macro-F1 at the top-level, respectively.…”
Section: Experimentation On Pdtbsupporting
confidence: 82%
See 1 more Smart Citation
“…Table 4 shows the performance of our model on PDTB-Ji at the top-level and second-level classes. Consistent with Kim et al (2020)'s conclusion, BERT is indeed the best baseline, achieves 51.88 and 36.10 in Micro-F1 and Macro-F1 at the secondlevel and 63.91 and 55.13 in Micro-F1 and Macro-F1 at the top-level, respectively.…”
Section: Experimentation On Pdtbsupporting
confidence: 82%
“…PDTB: following previous work (Ji and Eisenstein, 2015;Kim et al, 2020), we adopt the mostused dataset splitting PDTB-Ji that takes the sections 2-20 as the training set, 0-1 as the development set, and 21-22 for testing.…”
Section: Datasets and Experimental Settingsmentioning
confidence: 99%
“…We focus on implicit discourse relation analysis, where no explicit discourse marker exists. Following Kim et al (2020), we use the Level-2 labels with more than 100 examples and use 12-fold crossvalidation.…”
Section: Datasetsmentioning
confidence: 99%
“…The evaluation protocol is 5-fold cross-validation. Following Kim et al (2020), each fold is split at the document level rather than the individual example level.…”
Section: Kyoto University Web Document Leads Corpus (Kwdlc)mentioning
confidence: 99%
See 1 more Smart Citation