In this work we propose a novel numerical approach to decompose general quantum programs in terms of single- and two-qubit quantum gates with a CNOT gate count very close to the current theoretical lower bounds. In particular, it turns out that 15 and 63CNOT gates are sufficient to decompose a general 3- and 4-qubit unitary, respectively, with high numerical accuracy. Our approach is based on a sequential optimization of parameters related to the single-qubit rotation gates involved in a pre-designed quantum circuit used for the decomposition. In addition, the algorithm can be adopted to sparse inter-qubit connectivity architectures provided by current mid-scale quantum computers, needing only a few additional CNOT gates to be implemented in the resulting quantum circuits.