The rise of mobile devices with abundant sensory data and local computing capabilities has driven the trend of federated learning (FL) on these devices. And personalized FL (PFL) emerges to train specific deep models for each mobile device to address data heterogeneity and varying performance preferences. However, mobile training times vary significantly, resulting in either delay (when waiting for slower devices for aggregation) or accuracy decline (when aggregation proceeds without waiting). In response, we propose a shift towards asynchronous PFL, where the server aggregates updates as soon as they are available. Nevertheless, existing asynchronous protocols are unfit for PFL because they are devised for federated training of a single global model. They suffer from slow convergence and decreased accuracy when confronted with severe data heterogeneity prevalent in PFL. Furthermore, they often exclude slower devices for staleness control, which notably compromises accuracy when these devices possess critical personalized data. Therefore, we propose EchoPFL, a coordination mechanism for asynchronous PFL. Central to EchoPFL is to include updates from all mobile devices regardless of their latency. To cope with the inevitable staleness from slow devices, EchoPFL revisits model broadcasting. It intelligently converts the unscalable broadcast to on-demand broadcast, leveraging the asymmetrical bandwidth in wireless networks and the dynamic clustering-based PFL. Experiments show that compared to status quo approaches, EchoPFL achieves a reduction of up to 88.2% in convergence time, an improvement of up to 46% in accuracy, and a decrease of 37% in communication costs.