Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019) 2019
DOI: 10.18653/v1/d19-6301
|View full text |Cite
|
Sign up to set email alerts
|

The Second Multilingual Surface Realisation Shared Task (SR’19): Overview and Evaluation Results

Abstract: We report results from the SR'19 Shared Task, the second edition of a multilingual surface realisation task organised as part of the EMNLP'19 Workshop on Multilingual Surface Realisation. As in SR'18, the shared task comprised two different tracks: (a) a Shallow Track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (b) a Deep Track where additionally, functional words and morphological information were removed. The Shallow Track was offered in 11, and the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
44
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 60 publications
(44 citation statements)
references
References 35 publications
0
44
0
Order By: Relevance
“…Our results are decent (with the exception of the en_partutud-test dataset), suggesting that the approach may represent a viable starting point for future work. In particular, in the human evaluation results for English in the shared task overview paper (Mille et al, 2019), our system was ranked in the middle group of systems for meaning preservation and in the large group of systems tied for third-twelfth place in readability. Consistent with the human evaluation, the automatic scores for our system (Table 5) were also in the middle of the pack.…”
Section: Resultsmentioning
confidence: 99%
“…Our results are decent (with the exception of the en_partutud-test dataset), suggesting that the approach may represent a viable starting point for future work. In particular, in the human evaluation results for English in the shared task overview paper (Mille et al, 2019), our system was ranked in the middle group of systems for meaning preservation and in the large group of systems tied for third-twelfth place in readability. Consistent with the human evaluation, the automatic scores for our system (Table 5) were also in the middle of the pack.…”
Section: Resultsmentioning
confidence: 99%
“…Subsequently, DA has been adapted to evaluation of other tasks, such as automatic video captioning, with TRECvid adopting the method in 2017 [1,2]. DA has also been adapted more recently to surface realisation [18,19].…”
Section: Human Assessment Designmentioning
confidence: 99%
“…It should be noted, that the fluency and adequacy version of DA we run can be easily adapted to fit other evaluation criteria. For example, DA has been adapted to evaluation of surface realisation and the evaluation focuses instead on readability and meaning similarity as opposed to fluency and adequacy [18,19]. As can be seen from Figure 1, each assessor is shown a question, human-generated answer (or reference answer) and a system output answer on a single screen.…”
Section: Human Assessment Designmentioning
confidence: 99%
“…The 'shallow' (T1) track of the Surface Realization task (Mille et al, 2019) involves mapping Universal Dependencies (UD) graphs (De Marneffe et al, 2014) to surface forms, i.e. restoring word order and inflection based on the typed grammatical dependencies among a set of lemmas.…”
Section: Introductionmentioning
confidence: 99%