Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction

Noh, Hyeonwoo; Seo, Paul Hongsuck; Han, Bohyung

doi:10.1109/cvpr.2016.11

Cited by 281 publications

(180 citation statements)

References 17 publications

Supporting

Mentioning

179

Contrasting

Unclassified

Order By: Relevance

“…The task of Visual question answering [7], [8], [9], [10], [11] is well studied in the vision and language community, but it has been relatively less explored for providing explanation [3] arXiv:2002.10309v1 [cs.CV] 23 Jan 2020 for answer prediction. Recently, lot of works that focus on explanation models, one of that is image captioning for basic explanation of an image [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22].…”

Section: Related Workmentioning

confidence: 99%

U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps

Patro

Lunayach

Patel

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering task. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions. The improved attention maps result in consistent improvement for various methods for visual question answering. Therefore, the proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanations for deep learning models. We provide detailed empirical analysis for the visual question answering task on all standard benchmarks and comparison with state of the art methods.

show abstract

Section: Related Workmentioning

confidence: 99%

U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps

Patro

Lunayach

Patel

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…There are other methods besides CNN to implement VQA task. Noh et al [42] used an independent parametric predictive network with a GRU with the question as input and a fully connected layer generating as output. By combining hashing techniques, they reduced the complexity of constructing a parameter prediction network with a large number of parameters.…”

Section: ) Visual Question Answeringmentioning

confidence: 99%

AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions

Zhang

Guo

Wang

et al. 2019

2019 IEEE SmartWorld, Ubiquitous Intelligence &Amp; Computing, Advanced &Amp; Trusted Computing, Scalable Computing &Amp; Commu

View full text Add to dashboard Cite

In the last two decades, the landscape of text generation has undergone tremendous changes and is being reshaped by the success of deep learning. New technologies for text generation ranging from template-based methods to neural network-based methods emerged. Meanwhile, the research objectives have also changed from generating smooth and coherent sentences to infusing personalized traits to enrich the diversification of newly generated content. With the rapid development of text generation solutions, one comprehensive survey is urgent to summarize the achievements and track the state of the arts. In this survey paper, we present the general systematical framework, illustrate the widely utilized models and summarize the classic applications of text generation.

show abstract

“…Another proposed solution we identified is a dynamic parameter neural network whose parameters are determined adaptively based on input questions [34]. In this way the system reasons differently for each question.…”

Section: Identifying Clues In Image And/or Question To Generate Answermentioning

confidence: 99%