2018
DOI: 10.1371/journal.pone.0202789
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of automatic video captioning using direct assessment

Abstract: We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground truth or correct answer against which to measure. Metrics for comparing automatic video captions against a manual caption such as BLEU and METEOR, drawn from techniques used in evaluating machine translation, were used in the TRECVid video captioning task in 2016… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

2
14
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
1

Relationship

4
5

Authors

Journals

citations
Cited by 20 publications
(16 citation statements)
references
References 19 publications
2
14
0
Order By: Relevance
“…The remaining 69%, who did not meet this criterion, were omitted from computation of the of-ficial DA results above. Of those 31% included in the evaluation, a very high proportion, 97%, showed no significant difference in scores collected in repeated assessment of the same sentences; these high levels of agreement are consistent with what we have seen in DA used for Machine Translation (Graham et al, 2016) and Video Captioning evaluation (Graham et al, 2017).…”
Section: Mechanical Turk Resultssupporting
confidence: 83%
“…The remaining 69%, who did not meet this criterion, were omitted from computation of the of-ficial DA results above. Of those 31% included in the evaluation, a very high proportion, 97%, showed no significant difference in scores collected in repeated assessment of the same sentences; these high levels of agreement are consistent with what we have seen in DA used for Machine Translation (Graham et al, 2016) and Video Captioning evaluation (Graham et al, 2017).…”
Section: Mechanical Turk Resultssupporting
confidence: 83%
“…The remaining 69%, who did not meet this criterion, were omitted from computation of the of-ficial DA results above. Of those 31% included in the evaluation, a very high proportion, 97%, showed no significant difference in scores collected in repeated assessment of the same sentences; these high levels of agreement are consistent with what we have seen in DA used for Machine Translation and Video Captioning evaluation (Graham et al, 2017).…”
Section: Mechanical Turk Resultssupporting
confidence: 84%
“…While we attempt to find these better metrics, there is a large amount of research being done to improve captioning technology, and they use the existing metrics to evaluate their performance. Recognizing the problem, at TRECVID, we decided to include a manual evaluation, known as Direct Assessment (DA) (Graham et al, 2018 ) to selected submissions. The basic methodology is to present human assessors with a video and a single caption.…”
Section: Automatic and Manual Evaluationmentioning
confidence: 99%