Owning to good mechanical properties, steel–concrete composite (SCC) and prestressed concrete (PC) box girders are the types of elevated structures used most in urban rail transit. However, their vibro-acoustic differences are yet to be explored in depth, while structure-radiated noise is becoming a main concern in noise-sensitive environments. In this work, numerical simulation is used to investigate the vibration and noise characteristics of both types of box girders induced by running trains, and the numerical procedure is verified with data measured from a PC box girder. The mechanism of vibration transmission and vibro-acoustic comparisons between SCC and PC box girders are investigated in detail, revealing that more vibration and noise arise from SCC box girders. The vibration differences between them are around 7.7 dB(A) at the bottom plate, 19.3 dB(A) at the web, and 6.7 dB(A) at the flange, while for structure-radiated noise, the difference is around 5.9 dB(A). Then, potential vibro-acoustic control strategies for SCC box girders are discussed. As the vibro-acoustic responses of two types of girders are dominated by the force transmitted to the bridge deck, track isolation is better than structural enhancement. It is shown that using a floating track slab can make the vibration and noise of an SCC box girder lower than those of a PC box girder. However, structural enhancement for the SCC box girder is extremely limited in effects. The six proposed structural enhancement measures reduce vibration by only 1.1–3.6 dB(A) and noise by up to 1.5 dB(A).