Purpose of research. The purpose of this work is to increase the performance of question and response information systems in Russian. Scientific novelty of the work is to increase the performance for RuBERT model, which was trained to find the answer to the question in the text. As far as a more efficient language model allows more requests to be processed in the same time, the results of this work can be used in various information question and response systems for which response speed is important.Methods. The present work uses methods of processing natural language, machine learning, reducing the size of artificial neural networks. The language model was configured and trained using Torch and Onnxruntime machine learning libraries. The original model and training dataset were taken from the Huggingface Library.Results. As a result of the study, the performance of RuBERT language model was increased using methods to reduce the size of neural networks, such as distillation of knowledge and quantization, as well as by exporting the model to ONNX format and running it in ONNX runtime.Conclusion. As a result, the model, to which knowledge distillation, quantization and ONNX optimization were simultaneously applied, received a performance increase of ~ 4.6 times (from 66.57 to 404.46 requests per minute), while the size of the model decreased ~ 13 times (from 676.29 MB to 51.66 MB). The downside of obtained performance was EM deterioration (from 61.3 to 56.87) and F-measure (from 81.66 to 76.97).