Graphics Processing Units (GPUs) are used to accelerate computing-intensive tasks, such as neural networks,data analysis, high-performance computing, etc. In the past decade or so, researchers have done a lot of work on GPU architecture and proposed a variety of theories and methods to study the microarchitectural characteristics of various GPUs. In this study, the GPU serves as a co-processor and works together with the CPU in an embedded real-time system to handle computationally intensive tasks. It models the architecture of the GPU and further considers it based on some excellent work. The SIMT mechanism and Cache-miss situation provide a more detailed analysis of the GPU architecture. In order to verify the GPU architecture model proposed in this article, 10 GPU kernel_task and an Nvidia GPU device were used to perform experiments. The experimental results showed that the minimum error between the kernel task execution time predicted by the GPU architecture model proposed in this article and the actual measured kernel task execution time was 3.80%, and the maximum error was 8.30%.