There is growing interest in the application of crossmodal perception to interface design. However, most research has focused on task performance measures and often ignored user experience and engagement. We present an examination of crossmodal congruence in terms of performance and engagement in the context of a memory task of audio, visual, and audiovisual stimuli. Participants in a first study showed improved performance when using a visual congruent mapping that was cancelled by the addition of audio to the baseline conditions, and a subjective preference for the audiovisual stimulus that was not reflected in the objective data. Based on these findings, we designed an audiovisual memory game to examine the effects of crossmodal congruence on user experience and engagement. Results showed higher engagement levels with congruent displays with some reported preference for potential challenge and enjoyment that an incongruent display may support, particularly for increased task complexity.