The study introduces a comprehensive technique for enhancing the Natural Language Processing (NLP) capabilities of virtual assistant systems. The method addresses the challenges of efficient information transfer and optimizing model size while ensuring improved performance, with a primary focus on model pertaining and distillation. To tackle the issue of vocabulary size affecting model performance, the study employs the SentencePiece tokenizer with unigram settings. This approach allows for the creation of a well-balanced vocabulary, which is essential for striking the right balance between task performance and resource efficiency. a novel pre-layernorm design is introduced, drawing inspiration from models like BERT and RoBERTa. This optimization optimizes the placement of layer normalization within transformer layers during the pretraining phase. Teacher models are effectively trained using masked language modeling objectives and the Deepspeed scaling framework. Modifications to model operations are made, and mixed precision training strategies are explored to ensure stability. The two-stage distillation method efficiently transfers knowledge from teacher models to student models. It begins with an intermediate model, and the data is distilled carefully using logit and hidden layer matching techniques. This information transfer significantly enhances the final student model while maintaining an ideal model size for low-latency applications. In this approach, innovative measurements, such as the precision of filling a mask, are employed to assess the effectiveness and quality of the methods. The findings demonstrate substantial improvements over publicly available models, showcasing the effectiveness of the strategy within complete virtual assistant systems. The proposed approach confirms the potential of the technique to enhance language comprehension and efficiency within virtual assistants, specifically addressing the challenges posed by real-world user inputs. Through extensive testing and rigorous analysis, the capability of the method to meet these objectives is validated.