GPGPU (General Purpose Computing on Graphics Processing Units) has been widely applied to high performance computing. However, GPU architecture and programming model are different from that of traditional CPU. Accordingly, it is rather challenging to develop efficient GPU applications. This paper focuses on the key techniques of programming model and compiler optimization for many-core GPU, and addresses a number of key theoretical and technical issues. This paper proposes a many-threaded programming model ab-Stream, which would transparentize architecture differences and provide an easy to parallel, easy to program, easy to extend and easy to tune programming model. In addition, this paper proposes memory optimization and data transfer transformation according to data classification. Firstly, this paper proposes data layout pruning based on classification memory, and then proposes Ta T (Transfer after Transformed) for transferring Strided data between CPU and GPU. Experimental results demonstrate that proposed techniques would significantly improve performance for GPGPU applications.