Various 3-D talking heads have been developed lately for language learning, with both external and internal articulatory movements being visualized to guide learning. Mandarin pronunciation animation is challenging due to its confusable stops and affricates with similar places of articulation. Until now, less attention has been paid to the bio-signal information of aspiratory airflow, which is essential in distinguishing Mandarin consonants. This study fills a research gap by presenting the quantitative analyses of airflow, and then designing an airflow model for a 3-D pronunciation system. The airflow information was collected by Phonatory Aerodynamic System (PAS), so that confusable consonants in Mandarin could be discerned by means of airflow rate, peak airflow rate, airflow duration, and peak time. Based on the airflow parameters, an airflow model using the physical equation of fluid flow was proposed and solved, which was then combined and synchronized with the existing 3-D articulatory model. Therefore, the new multimodal system was implemented to synchronously exhibit the airflow motions and articulatory movements of uttering Mandarin syllables. Both an audio-visual perception test and a pronunciation training study were conducted to assess the effectiveness of our system. Perceptual results indicated that identification accuracy was improved for both native and non-native groups with the help of airflow motions, while native perceivers exhibited higher accuracy due to long-term language experience. Moreover, our system helped Japanese learners of Mandarin enhance their production skills of Mandarin aspirated consonants, reflected by higher gain values of voice onset time (VOT) after training.