We study delivering delay-sensitive data to a group of receivers with minimum latency. This latency consists of the time that the data spends in overlay links as well as the delay incurred at each overlay node, which has to send out a piece of data several times over a finite-capacity network connection. The latter part is a significant portion of the total delay as we show in the paper, yet it is often ignored or only partially addressed by previous multicast algorithms. We analyze the actual delay in multicast trees and consider building trees with minimum-average and minimum-maximum delay. We show the NP-hardness of these problems and prove that they cannot be approximated in polynomial time to within any reasonable approximation ratio. We then present a set of algorithms to build minimum-delay multicast trees that cover a wide range of application requirements-min-average and min-max delay, for different scales, real-time requirements, and session characteristics. We conduct comprehensive experiments on different real-world datasets, using various overlay network models. The results confirm that our algorithms can achieve much lower delays (up to 60% less) and up to orders-of-magnitude faster running times (i.e., supporting larger scales) than previous related approaches.