Researchers have put a lot of effort into reducing the gap between current quantum processing units (QPU) capabilities and their potential supremacy. One approach is to keep supplementary computations in the CPU, and use QPU only for the core of the problem. In this work, we address the complexity of quantum algorithm of arbitrary quantum state initialization, an important building block of quantum data analysis and machine learning. QPUs do not outperform classical machines with existing precise initialization algorithms. Hence, many studies propose an approximate but robust quantum state initialization. Cutting a quantum state into a product of (almost) independent partitions with the help of CPU reduces the number of two-qubit gates, and correspondingly minimizes the loss of state fidelity in the quantum part of the algorithm. To find the least entangled qubits, current methods compute the singular value decomposition (SVD) for each qubit separately with CPU. In this paper, we optimize CPU usage and memory resource bottlenecks. We consider Tucker tensor decomposition as an alternative to the CPU-based SVD in a single low-entangled qubit detection task without the loss of solution quality. Both proposed methods outperform the SVD in time and memory for systems of at least ten qubits. We achieve an order faster implementation and two orders less memory usage for a system of 15 qubits.