Recently conversational agents effectively improve their understanding capabilities by neural networks. Such deep neural models, however, do not apply to most human languages due to the lack of annotated training data for various NLP tasks. In this paper, we propose a multi-level cross-lingual transfer model with language shared and specific knowledge to improve the spoken language understanding of lowresource languages. Our method explicitly separates the model into the language-shared part and languagespecific part to transfer cross-lingual knowledge and improve the monolingual slot tagging, especially for low-resource languages. To refine the shared knowledge, we add a language discriminator and employ adversarial training to reinforce information separation. Besides, we adopt novel multi-level knowledge transfer in an incremental and progressive way to acquire multi-granularity shared knowledge rather than a single layer. To mitigate the discrepancies between the feature distributions of language specific and shared knowledge, we propose the neural adapters to fuse knowledge automatically. Experiments show that our proposed model consistently outperforms monolingual baseline with a statistically significant margin up to 2.09%, even higher improvement of 12.21% in the zero-shot setting.INDEX TERMS Spoken language understanding, cross-lingual learning, linguistic knowledge transfer, adversarial learning, multi-level knowledge representation.