2017
DOI: 10.48550/arxiv.1707.00836
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DeepStory: Video Story QA by Deep Embedded Memory Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 22 publications
(19 citation statements)
references
References 9 publications
0
19
0
Order By: Relevance
“…We compare FVTA with recent results on MovieQA dataset, including End-to-End Memory Network (MemN2N) [23], Deep Embedded Memory Network (DEMN) [10], and Read-Write Memory Network (RWMN) [15]. Table 4 shows the detailed comparison of MovieQA results using both videos and subtitles.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We compare FVTA with recent results on MovieQA dataset, including End-to-End Memory Network (MemN2N) [23], Deep Embedded Memory Network (DEMN) [10], and Read-Write Memory Network (RWMN) [15]. Table 4 shows the detailed comparison of MovieQA results using both videos and subtitles.…”
Section: Resultsmentioning
confidence: 99%
“…A recent direction is on the question answering based on videos, which is more relevant to this work. A number of research studies have been carried on MovieQA [22,10,15], with movie clips, scripts, and descriptions. Because it is expensive to 1 Code and models are released at https://memexqa.cs.cmu.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…[3]. Other video VQA datasets include social questions in addition to other more factual questions, such as the TVQA dataset that uses videos from six wellknown TV shows [16]; PororoQA, that uses videos from a children's animated television show [17]; and MovieQA that contains questions about movies [18].…”
Section: B Social Vqa For Ai Agentsmentioning
confidence: 99%
“…Hence, to study the task of multimodal moment retrieval with both video and subtitle text contexts, we propose a new dataset -TV show Retrieval (TVR). Inspired by recent works [30,15,18] which relied on Movie/Cartoon/TV shows for building multimodal datasets, we select TV shows as our data collection resource as they typically involve rich social interactions between actors, involving both activities and dialogues. During data collection, we present annotators with videos and associated subtitles to encourage them to write multimodal queries.…”
Section: Introductionmentioning
confidence: 99%