We use various multimedia applications on smart devices to consume multimedia content, to communicate with our peers, and to broadcast our events live. This paper investigates the utilization of different media input/output devices, e.g., camera, microphone, and speaker, by different types of multimedia applications, and introduces the notion of multimedia context. Our measurements lead to a sensing algorithm called MediaSense, which senses the states of multiple I/O devices and identifies eleven multimedia contexts of a mobile device in real time. The algorithm distinguishes stored content playback from streaming, live broadcasting from local recording, and conversational multimedia sessions from GSM/VoLTE calls on mobile devices.