In 5G Ultra-Dense Networks, a distributed wireless backhaul is an attractive solution for forwarding traffic to the core. The macro-cell coverage area is divided into many small cells. A few of these cells are designated as gateways and are linked to the core by high-capacity fiber optic links. Each small cell is associated with one gateway and all small cells forward their traffic to their respective gateway through multi-hop mesh networks. We investigate the gateway location problem and show that finding near-optimal gateway locations improves the backhaul network capacity. An exact p-median integer linear program is formulated for comparison with our novel K-GA heuristic that combines a Genetic Algorithm (GA) with K-means clustering to find near-optimal gateway locations. We compare the performance of K-GA with six other approaches in terms of average number of hops and backhaul network capacity at different node densities through extensive Monte Carlo simulations. All approaches are tested in various user distribution scenarios, including uniform distribution, bivariate Gaussian distribution, and cluster distribution. In all cases, K-GA provides near-optimal results, achieving average number of hops and backhaul network capacity within 2% of optimal while saving an average of 95% of the execution time. INDEX TERMS 5G, backhaul network capacity, gateway location problem, heuristic, machine learning, small cells, ultra-dense networks.