Word-initial stops in Mandarin and English show a distinctive phonological categorization but a similar phonetic realization along the VOT (Voice Onset Time) continuum. Previous research reported that native Mandarin adults produce measurably longer long-lag VOTs than native English adults. The present study examined whether and how the difference between Mandarin and English VOTs is manifested in monolingual children and Mandarin–English bilingual children. The participants included 15 five- to six-year-old sequential bilingual children, 24 corresponding monolingual children (15 Mandarin, 9 English), and 22 monolingual adults (12 Mandarin, 10 English). The bilingual children were divided into two groups (Bi-low and Bi-high) based on the amount of experience in English. Each participant was recorded producing 18 Mandarin words and/or 18 English words containing six stops in each language. The VOT values were measured from the beginning of stop burst to the onset of the voicing. The results showed that the language difference in VOT in the monolingual children was manifested in a pattern similar to the monolingual adults. However, Mandarin and English VOTs showed less separable distributions in the two groups of bilingual children. Further analysis suggested that both groups of bilingual children tended to separate Mandarin and English short-lag VOTs but only the Bi-low children showed different long-lag VOTs between the two languages. These results suggested that due to the bilingual effects and L1–L2 (first language – second language) interactions, even though the bilingual children tried to separate the two VOT systems, they implemented the separation in a different manner than the monolingual speakers.