Food and fluid intake monitoring are essential for reducing the risk of dehydration, malnutrition, and obesity. The existing research has been preponderantly focused on dietary monitoring, while fluid intake monitoring, on the other hand, is often neglected. Food and fluid intake monitoring can be based on wearable sensors, environmental sensors, smart containers, and the collaborative use of multiple sensors. Vision-based intake monitoring methods have been widely exploited with the development of visual devices and computer vision algorithms. Vision-based methods provide non-intrusive solutions for monitoring. They have shown promising performance in food/beverage recognition and segmentation, human intake action detection and classification, and food volume/fluid amount estimation. However, occlusion, privacy, computational efficiency, and practicality pose significant challenges. This paper reviews the existing work (253 articles) on vision-based intake (food and fluid) monitoring methods to assess the size and scope of the available literature and identify the current challenges and research gaps. This paper uses tables and graphs to depict the patterns of device selection, viewing angle, tasks, algorithms, experimental settings, and performance of the existing monitoring systems.