Social media provides high-volume and real-time data, which has been broadly used in diverse applications in sales, marketing, disaster management, health surveillance, etc. However, distinguishing between noises and reliable information can be challenging, since social media, a user-generated content system, has a great number of users who update massive information every second. The rich information is not only included in the short textual content but also embedded in the images and videos. In this paper, we introduce an effective and efficient framework for event detection with social media data. The framework integrates both textual and imagery content in the hope to fully utilize the information. The approach has been demonstrated to be more accurate than the text-only approach by removing 58 (66.7%) false-positive events. The precision of event detection is improved by 6.5%. Besides, based on our analysis, we also look into the content of these images to further explore the space of social media studies. Finally, the closely related text and image from social media offer us valuable text-image mapping, which can enable knowledge transfer between two media types.