This paper studies joint spectrum allocation and user association in large heterogeneous cellular networks. The objective is to maximize some network utility function based on given traffic statistics collected over a slow timescale, conceived to be seconds to minutes. A key challenge is scalability: interference across cells creates dependencies across the entire network, making the optimization problem computationally challenging as the size of the network becomes large. A suboptimal solution is presented, which performs well in networks consisting of one hundred access points (APs) serving several hundred user devices. This is achieved by optimizing over local overlapping neighborhoods, defined by interference conditions, and by exploiting the sparsity of a globally optimal solution. Specifically, with a total of k user devices in the entire network, it suffices to divide the spectrum into k segments, where each segment is mapped to a particular set, or pattern, of active APs within each local neighborhood. The problem is then to find a mapping of segments to patterns, and to optimize the widths of the segments. A convex relaxation is proposed for this, which relies on a re-weighted 1 approximation of an 0 constraint, and is used to enforce the mapping of a unique pattern to each spectrum segment. A distributed implementation based on alternating direction method of multipliers (ADMM) is also proposed. Numerical comparisons with benchmark schemes show that the proposed method achieves a substantial increase in achievable throughput and/or reduction in the average packet delay.