Due to the decoherence of the state-of-the-art physical implementations of quantum computers, it is essential to parallelize the quantum circuits to reduce their depth. Two decades ago, Moore and Nilsson [1] demonstrated that additional qubits (or ancillae) could be used to design "shallow" parallel circuits for quantum operators. They proved that any n-qubit CNOT circuit could be parallelized to O(log n) depth, with O(n 2 ) ancillae. However, the near-term quantum technologies can only support limited amount of qubits, making space-depth trade-off a fundamental research subject for quantum-circuit synthesis.In this work, we establish an asymptotically optimal space-depth trade-off for the design of CNOT circuits. We prove that for any m ≥ 0, any n-qubit CNOT circuit can be parallelized to O max log n, n 2 (n+m) log(n+m) depth, with m ancillae. We show that this bound is tight by a counting argument, and further show that even with arbitrary two-qubit quantum gates to approximate CNOT circuits, the depth lower bound still meets our construction, illustrating the robustness of our result. Our work improves upon two previous results, one by Moore and Nilsson [1] for O(log n)-depth quantum synthesis, and one by Patel, Markov, and Hayes [2] for m = 0: for the former, we reduce the need for ancillae by a factor of log 2 n by showing that m = O(n 2 / log 2 n) additional qubits -which is asymptotically optimal -suffice to build O(log n)-depth, O(n 2 / log n)-size CNOT circuits; for the later, we reduce the depth by a factor of n to the asymptotically optimal bound O n log n . Our results can be directly extended to stabilizer circuits using an earlier result by Aaronson and Gottesman [3]. In addition, we provide relevant hardness evidence for synthesis optimization of CNOT circuits in term of both size and depth.