Speech brain-computer interfaces (BCIs) directly translate brain activity into speech sound and text, yet decoding tonal languages like Mandarin Chinese poses a significant, unexplored challenge. Despite successful cases in non-tonal languages, the complexities of Mandarin, with its distinct syllabic structures and pivotal lexical information conveyed through tonal nuances, present challenges in BCI decoding. Here we designed a brain-to-text framework to decode Mandarin tonal sentences from invasive neural recordings. Our modular approach dissects speech onset, base syllables, and lexical tones, integrating them with contextual information through Bayesian likelihood and the Viterbi decoder. The results demonstrate accurate tone and syllable decoding under variances in continuous naturalistic speech production, surpassing previous intracranial Mandarin tonal syllable decoders in decoding accuracy. We also verified the robustness of our decoding framework and showed that the model hyperparameters can be generalized across participants of varied gender, age, education backgrounds, pronunciation behaviors, and coverage of electrodes. Our pilot study shed lights on the feasibility of more generalizable brain-to-text decoding of natural tonal sentences from patients with high heterogeneities.