As IP telephony gains more popularity, interworking with conventional PSTN telephony has also gained more importance. In particular, an increasing number of new telephony services now involves both packet-switched (IP telephony) and circuit-switched (PSTN telephony) voice legs in one call session. One common problem that arises for enabling such new services is the need for synchronization of voice streams that traverse through heterogeneous telephony systems. In this paper, we first identify the key role of voice synchronization across heterogeneous telephony systems for services such as seamless handover between WLAN and cellular networks and multi-party audio conferencing with video overlay. We then explain the challenges in synchronizing circuit-switched and packet-switched voice streams, including codec distortion, packet losses, line noises, and overlapping utterances. To achieve voice synchronization, we proceed to investigate three different approaches based on digital speech processing techniques in the waveform, cepstrum, and spectrum domains. Finally, we compare the performance benefits and tradeoffs of different approaches, thus motivating further research along this direction.