Nowadays, most applications hosted on public cloud data centers (DCs) disseminate data from a single source to a group of receivers for service deployment, data replication, software upgrade, etc. For such one-to-many data communication paradigm, multicast routing is the natural choice as it reduces network traffic and improves application throughput. Unfortunately, recent approaches adopting IP multicast routing suffer from scalability and load balancing issues, and do not scale well with the number of supported multicast groups when used for cloud DC networks. Furthermore, IP multicast does not exploit the topological properties of DCs, such as the presence of multiple parallel paths between end hosts. Despite the recent efforts aimed at addressing these challenges, there is still a need for multicast routing protocol designs that are both scalable and load-balancing aware. This paper proposes Ernie, a scalable load-balanced multicast source routing for large-scale DCs. At its heart, Ernie further exploits DC network structural properties and switch programmability capabilities to encode and organize multicast group information inside packets in a way that minimizes downstream header sizes significantly, thereby reducing overall network traffic. Additionally, Ernie introduces an efficient load balancing strategy, where multicast traffic is adequately distributed at downstream layers. To study the effectiveness of Ernie, we extensively evaluate Ernie's scalability behavior (i.e., switch memory, packet size overheads, and CPU overheads), and load balancing ability through a mix of simulation and analysis of its performances. For example, experiments of large-scale DCs with 27k+ servers show that Ernie requires a downstream header sizes that are 10× smaller than those needed under state-of-the-art schemes while keeping end-host overheads at low levels. Our simulation results also indicate that at highly congested links, Ernie can achieve a better multicast load balancing than other existing schemes.