Visual question answering (VQA), which is an important presentation form of AI-complete task and visual Turing tests, coupled with its potential application value, attracted widespread attention from both researchers in computer vision and natural language processing. However, there are no relevant research regarding the expression and participation methods of knowledge in VQA. Considering the importance of knowledge for answering questions correctly, this paper analyzes and researches the stratification, expression and participation process of knowledge in VQA and proposes a knowledge description framework (KDF) to guide the research of knowledge-based VQA (Kb-VQA). The KDF consists of a basic theory, implementation methods and specific applications. This paper focuses on describing mathematical models at basic theoretical levels, as well as the knowledge hierarchy theories and key implementation behaviors established on this basis. In our experiment, using the statistics of VQA’s accuracy in the relevant literature, we propose a good corroboration of the research results from knowledge stratification, participation methods and expression forms in this paper.