Selecting a proper set of synthesis units is crucial to the speech synthesis system. The length and structure of synthesis units concern the quality and naturalness of synthesized speech, as well as the size of database and the complexity of synthesis system. Optional units are words, syllables, initial-finals, phonemes, etc. In Mandarin text-to-speech (TTS) system, syllables are often used as synthesis units. The reason is that they are the most natural and basic units for pronunciation, and an , Xiang Xie, Ming Tu, Xingyu Na Abstr act. Co-articulation is a common phenomenon in human speech, which guarantees the speech sound coherent and natural. Synthesized speech, however, often sounds artificial. This is somewhat because of its inability to imitate coarticulation well. This paper defines a novel set of synthesis units to preserve both intra-and inter-syllable co-articulation. The boundaries of the new unit are located respectively at each essential vowel of two adjacent syllables. It consists of three parts: final-tail of the preceding syllable, initial consonant and final-head of the following syllable so that we call it Nal-Initial-FI (NIF) unit. To locate the boundaries, we adopt the maximum spectral stability criterion. It can find out the most stable point within the essential vowel. In the experiment, we test NIF units on the HMM-based speech synthesis system (HTS) and compare the result to the syllable unit system. The Preference test and the Comparison Category Rating (CCR) test show that the speech synthesized with NIF units has better naturalness than that with syllable units, and the speech quality of both systems is comparable.Keywor ds: synthesis unit · co-articulation · maximum spectral stability criterion· Mandarin speech·HTS 1 Intr oduction 1 Yishan Jiao () Beijing Institute of Technology, Haidian district,