Previous word production research employing the implicit-priming paradigm has shown that speakers can benefit from advance knowledge of the initial word form of the word to be produced. In Dutch and English, a single onset segment is sufficient to produce the benefit, but a complete syllable (without the tone) is required in Mandarin Chinese. These findings have been interpreted as suggesting language-dependent proximate units for word-form encoding, which are intrinsic to a language-specific system. Nonetheless, the absence of a segment effect in Mandarin Chinese might have to do with the orthographic characteristics of the prompts, which are syllable-based and could have motivated the production system to place more emphasis on the syllable than on the segment. Two experiments were conducted to test this hypothesis. In Experiment 1, we employed the implicitpriming paradigm with both spoken and written prompts, and in Experiment 2 we adopted a picture version of this paradigm. Spoken prompts are less likely to encourage an orthographically induced syllable bias, and picture naming involves no prompts, leaving no room for any syllable bias that prompts might induce. The results from both experiments showed syllable preparation effects but no segment preparation effects, regardless of whether prompts were written, spoken, or absent. These findings suggest that the syllable as the proximate unit in Mandarin Chinese word production is an intrinsic, and not an accidental or task-dependent, property of the production system. Keywords Mandarin Chinese . Word production . Proximate unit . Implicit priming Producing a word involves accessing its abstract phonological contents from the mental lexicon and assembling them into units of the proper size, so that these units can be mapped onto articulatory movements during speaking. This process is known as word-form encoding. According to the word production model proposed by Levelt, Roelofs, and Meyer (1999), the phonological contents selected at the beginning of word-form encoding are segments rather than syllables. The syllables are constructed online via a segment-to-frame association procedure. This proposal is consistent with the phonological characteristics of the languages that the model is built upon-that is, Dutch and English-in which resyllabification is often required in connected speech, such that the syllables of a word are best constructed online rather than stored and retrieved. The main evidence supporting the segment as the first unit selected for word-form encoding in Dutch and English has been obtained with the implicitpriming task. In this task, participants repeatedly name a set of words, which do or do not overlap in their initial segment (s). Consistent segment preparation effects (i.e., faster response times when there was overlapping than when there was not) have been observed, and these effects increased with