This paper aims at describing prosodic focalization as a multimodal phenomenon in Brazilian Portuguese, evaluating the role of two modalities in focus production and perception: audio (A), visual (V), and their combined audiovisual presentation (AV). Five focus types are considered, according to their semantic-pragmatic values: (a) in declarative sentences: (i) IF - informational focus (answer to a previous question, conveying new information), (ii) CF- contrastive (strong) focus (correction of information considered wrong); (iii) ATF - attenuated (weak) focus (proposition of an alternative solution to previous information); (b) in interrogative sentences: (i) INTF - interrogative focus (a new information is requested in the question); (ii) SF - surprise focus (one casts doubt on a previous information). Also, structural factors were evaluated, as focus extension and position in the sentence. After running a multimodal perceptual experiment and developing an acoustic and visual analysis on focus production, results show that multimodality plays a relevant role in focus production and perception. Different acoustic and visual parameters, or configuration of parameters, contribute to conveying distinct meanings, according to each focus type.