Natural language understanding (NLU) is a core technique for implementing natural user interfaces. In this study, we propose a neural network architecture to learn syntax vector representation by employing the correspondence between texts and syntactic structures. For representing the syntactic structures of sentences, we used three methods: dependency trees, phrase structure trees, and part of speech tagging. A pretrained transformer is used to propose text-to-vector and syntax-to-vector projection approaches. The texts and syntactic structures are projected onto a common vector space, and the distances between the two vectors are minimized according to the correspondence property to learn the syntax representation. We conducted massive experiments to verify the effectiveness of the proposed methodology using Korean corpora, i.e., Weather, Navi, and Rest, and English corpora, i.e., the ATIS, SNIPS, Simulated Dialogue-Movie, Simulated Dialogue-Restaurant, and NLU-Evaluation datasets. Through the experiments, we concluded that our model is quite effective in capturing a syntactic representation and the learned syntax vector representations are useful.