An important part of reaping computational advantage from a quantum computer is to reduce the quantum resources needed to implement a desired quantum algorithm. Quantum algorithms that are too large to be practical on noisy intermediate scale quantum devices will require fault-tolerant error correction. This work focuses on reducing the physical cost of implementing quantum algorithms when using the state-of-the-art fault-tolerant quantum error correcting codes, in particular, those for which implementing the T gate consumes vastly more resources than the other gates in the gate set. More specifically, in this paper we consider the group of unitaries that can be exactly implemented by a quantum circuit consisting of the Clifford + T gate set. The Clifford + T gate set is a universal gate set and in this group, using state-of-the-art surface codes, the T gate is by far the most expensive component to implement fault-tolerantly. So it is important to minimize the number of T gates necessary for a fault-tolerant implementation. Our primary interest is to compute a circuit for a given n-qubit unitary U, using the minimum possible number of T gates (called the T-count of unitary U). We consider the problem COUNT-T, the optimization version of which aims to find the T-count of U. In its decision version the goal is to decide if the T-count is at most some positive integer m. Given an oracle for COUNT-T, we can compute a T-count-optimal circuit in time polynomial in the T-count and dimension of U. We give a provable classical algorithm that solves COUNT-T (decision) in time O N 2(c−1) m c poly(m, N) and space O N 2 m c poly(m, N) , where N = 2 n and c 2. This gives a space-time trade-off for solving this problem with variants of meet-in-the-middle techniques. We also introduce an asymptotically faster multiplication method that shaves a factor of N 0.7457 off of the overall complexity. Lastly, beyond our improvements to the rigorous algorithm, we give a heuristic algorithm that outputs a T-count-optimal circuit and has space and time complexity poly(m, N), under some assumptions. In our heuristic algorithm we developed a novel way of pruning the search space. While our heuristic method still scales exponentially with the number of qubits (though with a lower exponent), there is a large improvement by going from exponential to polynomial scaling with m. We implemented our heuristic algorithm with up to 4 qubit unitaries and obtained a significant improvement in time. For all benchmark and random unitaries we studied, the T-count returned by our algorithm is at most the T-count of their circuits shown in previous papers. iπ 4
P.Roughly speaking, each R(P) can be implemented with a circuit consisting of only one T gate. More detail has been given in section 2.2. Operator P is an n-qubit non-identity Pauli operator (defined in section A.1). We developed a fast algorithm in section 3 that computes W in time O(N 4 ). Currently the fastest algorithm