In this paper, we present a thorough and realistic analysis of voice (i.e. audio conferencing) over application-level multicast (ALM).Through flexibility and ease-of-deployment, ALM is a compelling alternative group-communication technique to IP Multicast -which has yet to see wide-scale deployment in the Internet. However, proposed ALM techniques suffer from inherent latency inefficiencies, which we show, through realistic simulation and exploration of perceived quality in multi-party conversation, to be greatly problematic for the realisation of truly-scalable audio-conferencing systems over ALM.By incorporating talkspurt data from a large and detailed corpus of multi-party conversation, and through using network-simulation techniques based on actual Internet latency measurements, we develop our previous work on the Application-Level Network Audio-Conferencing (ALNAC) routing protocol into a thorough analysis of the problem, leading to a novel model for assessing the perceptual quality of multi-party conversation and to novel techniques for speaker prediction. We show that through adaptation to conversational patterns, the ALNAC protocol can achieve perceptual quality for large-scale audio conferencing that, with little cost to each end-system node, is comparable to IP Multicast.