Clock distribution in VLSI designs is of crucial importance and it is also a major source of power dissipation of a system. For today's high performance microprocessors, clock signals are usually distributed by a global clock grid covering the whole chip, followed by post-grid routing that connects clock loads to the clock grid. Early study [7] shows that about 18.1% of the total clock capacitance dissipation was due to this post-grid clock routing (i.e., lower mesh wires plus clock twig wires). This post-grid clock routing problem is thus an important one but not many previous works have addressed it. In this paper, we try to solve this problem of connecting clock ports to the clock grid through reserved tracks on multiple metal layers, with delay and slew constraints. Note that a set of routing tracks are reserved for this grid-to-ports clock wires in practice because of the conventional modular design style of high-performance microprocessors. We propose a new expansion algorithm based on the heap data structure to solve the problem effectively. Experimental results on industrial test cases show that our algorithm can improve over the latest work on this problem [10] significantly by reducing the capacitance by 24.6% and the wire length by 23.6%. We also validate our results using hspice simulation. Finally, our approach is very efficient and for larger test cases with about 2000 ports, the runtime is in seconds.