This study aims to investigate sex differences in ratings for facial attractiveness (FA) and vocal attractiveness (VA). Participants (60 undergraduates in Study 1 and 111 undergraduates in Study 2) rated the attractiveness of computerized face images and voice recordings of men and women. In Study 1, face images and voice recordings were presented separately. Results indicated that men generally rated voice recordings of women more attractive than those of men, whereas women did not show different attractiveness ratings for voices of men vs. women. In Study 2, face images and voice recordings were paired as multimodal stimuli and presented simultaneously. Results indicated that men rated multimodal stimuli of women as more attractive than those of men, whereas women did not differentiate multimodal stimuli of men vs. women. We found that, compared to VA, FA had a stronger influence on participants' overall evaluations. Finally, we tested the difference between “original multimodal stimuli” (OMS) and “non-original multimodal stimuli” (non-OMS) and found the “OMS-facilitating effect.” Taken together, findings indicated some sex differences in FA and VA in the current study, which could be used to interpret behaviors of sexual selection, human mate preferences, and designs and popularization of sex robots.