The paper is devoted to the synthesis and optimization of the pipelines of the data pre-processing operations in the problems of the machine learning models construction. It is noted that it is important to optimize the triad of these pipelines -select optimal sequence of the optimal operations with the optimal parameters. In this case, the change of even one element immediately influences the choice of all other elements and their parameters. In general case, there exists a great number of the admissible variants of such pipelines for each model of machine learning and input data (random values or time series) and, as a rule, there is no marked datasets of model training for the synthesis of such pipelines. The survey of the known approaches to the solution of such problems has been carried out, the conclusion that the best way is to formalize them as the problems of reinforcement machine learning has been substantiated t. Typical approaches to the formalization and intellectual methods of similar problems solution have been presented.It is noted that the solution of the problems with reinforcement, as a rule, is complicated due to large dimensionality of the possible sets of the types and subtypes of the operations with different parameters and has problems with the coincidence to really optimal value during limited time. That is why, several improvements, enabling to solve this problem at certain conditions, are suggested. First, it is suggested to allocate variable and constant sections of the pipeline of the data pre-processing operations. It is also suggested for different types of the machine learning models what operations should be referred to the first and last unchangeable links and what operationsto variable link and only to this link it is suggested to apply reinforcement learning. Secondly, the algorithm of the initial setting of RL-policy parameters depending on certain statistical and other characteristics of the input data is suggested. The proposed improvement of the method with the reinforcement of the synthesis of the optimal pipeline of the operations can be applied not only for pre-processing operations but for other problems with the similar data formalization and problem set up.