This study aims to reveal dynamic brain networks during speech perception. All male subjects were presented five English vowel [a], [e], [i], [o], and [u] stimuli. Brain dynamics were decoded using multivariate Gaussian hidden Markov model (MGHMM), which trained on spatiotemporal patterns of broadband multivariate event-related potential amplitudes to identify distinct broadband EEG microstates (MS), microstate source imaging, and microstate functional connectivity (μFC). Obtained results showed fluctuated cortical generators and μFC in eight microstates throughout the perception. Microstate source imaging revealed involvements of bilateral (left-side dominance) posterior superior temporal cortex (TC), inferior frontal gyrus (IFG), and supramarginal regions in perception. Precentral cortex where primary motor cortex located was also significantly activated. These regions were early appeared at 96-151 ms (left-side dominance) and at 186-246 ms (left hemisphere only) after the stimuli onset. Results from μFC revealed significant increases in delta (2.5-4.5 Hz), theta (4.5-8.5 Hz), alpha (12.5-14.5 Hz), beta (22.5-24.5 Hz), low gamma (30.5-32.5, 38.5-40.5 Hz) but decreases in high gamma (42.5-46.5 Hz) bands in perception. Increased FC were observed mainly at; (1) microstate segments 34-95 ms (MS2) and 96-151 ms (MS3) in early stages, (2) microstate intervals 186-246 ms (MS5) and 297-449 ms (MS6) in subsequent stages of perception. We found that stronger statistical FC differences in perception at TCs, with respect to left IFG (Broca' area), left TC, and precentral areas. Furthermore, by conducting a comparative protocol measuring FC distinction degree, we showed performance improvements of 8.01% (p-value=0.0162), 14.41% (p-value=0.006) when compared MGHMM to well established Lehmann-based modified K-means, Atomize and Agglomerative Hierarchical Clustering and 8.791% (p-value=0.0097) over the combination of K-means and sliding window methods, respectively. This study indicates the usefulness of EEG microstates to investigate broadband brain dynamics in speech perception. The current findings based on male subjects would be generalized more by future studies with a larger appropriate sample size including female subjects.