Quantum computing offers the potential of exponential speedups for certain classical computations. Over the last decade, many quantum machine learning (QML) algorithms have been proposed as candidates for such exponential improvements. However, two issues unravel the hope of exponential speedup for some of these QML algorithms: the data-loading problem and, more recently, the stunning "dequantization" results of Tang et al. A third issue, namely the fault-tolerance requirements of most QML algorithms, has further hindered their practical realization. The quantum topological data analysis (QTDA) algorithm of Lloyd, Garnerone and Zanardi was one of the first QML algorithms that convincingly offered an expected exponential speedup. From the outset, it did not suffer from the dataloading problem. A recent result has also shown that the generalized problem solved by this algorithm is likely classically intractable, and would therefore be immune to any dequantization efforts. However, the QTDA algorithm of Lloyd et al. has a time complexity of O(n 4 /(ǫ 2 δ √ ζ)) (where n is the number of data points, ǫ is the error tolerance, δ is the smallest nonzero eigenvalue of the restricted Laplacian, and ζ is the fraction of all simplices in the complex) and requires fault-tolerant quantum computing, which has not yet been achieved. In this paper, we completely overhaul the QTDA algorithm to achieve an improved exponential speedup and depth complexity of O(n log(1/(δǫ))). The latter depth complexity opens the door for an implementation on near-term quantum hardware, potentially making it the first useful algorithm to achieve quantum advantage on general classical data. Our approach includes three key innovations: (a) an efficient realization of the combinatorial Laplacian as a sum of Pauli operators; (b) a quantum rejection sampling and projection approach to restrict the superposition to the simplices of the desired order in the complex (replacing Grover's search of Lloyd et al.); and (c) a stochastic rank estimation method to estimate the Betti numbers (replacing quantum phase estimation of Lloyd et al.). We present a theoretical error analysis for the proposed algorithm, and present the circuit and computational time and depth complexities for Betti number estimation up to the error tolerance ǫ. The techniques presented herein have wider potential applications than QTDA or even rank estimation.