Background/Objectives: Previous studies have examined the role of working memory in cognitive tasks such as syntactic, semantic, and phonological processing, thereby contributing to our understanding of linguistic information management and retrieval. However, the real-time processing of phonological information—particularly in relation to suprasegmental features like tone, where its contour represents a time-varying signal—remains a relatively underexplored area within the framework of Information Processing Theory (IPT). This study aimed to address this gap by investigating the real-time processing of similar tonal information by native Cantonese speakers, thereby providing a deeper understanding of how IPT applies to auditory processing. Methods: Specifically, this study combined assessments of cognitive functions, an AX discrimination task, and electroencephalography (EEG) to investigate the discrimination results and real-time processing characteristics of native Macau Cantonese speakers perceiving three pairs of similar tones. Results: The behavioral results confirmed the completed merging of T2–T5 in Macau Cantonese, and the ongoing merging of T3–T6 and T4–T6, with perceptual merging rates of 45.46% and 27.28%, respectively. Mismatch negativity (MMN) results from the passive oddball experiment revealed distinct temporal processing patterns for the three tone pairs. Cognitive functions, particularly attention and working memory, significantly influenced tone discrimination, with more pronounced effects observed in the mean amplitude of MMN during T4–T6 discrimination. Differences in MMN peak latency between T3–T6 and T4–T6 further suggested the use of different perceptual strategies for these contour-related tones. Specifically, the T3–T6 pair can be perceived through early signal input, whereas the perception of T4–T6 relies on constant signal input. Conclusions: This distinction in cognitive resource allocation may explain the different merging rates of the two tone pairs. This study, by focusing on the perceptual difficulty of tone pairs and employing EEG techniques, revealed the temporal processing of similar tones by native speakers, providing new insights into tone phoneme processing and speech variation.