Current methods to detect eating behavior events (i.e., bites, chews, and swallows) lack objective measurements, standard procedures, and automation. The video recordings of eating episodes provide a non-invasive and scalable source for automation. Here, we reviewed the current methods to automatically detect eating behavior events from video recordings. According to PRISMA guidelines, publications from 2010–2021 in PubMed, Scopus, ScienceDirect, and Google Scholar were screened through title and abstract, leading to the identification of 277 publications. We screened the full text of 52 publications and included 13 for analysis. We classified the methods in five distinct categories based on their similarities and analyzed their accuracy. Facial landmarks can count bites, chews, and food liking automatically (accuracy: 90%, 60%, 25%). Deep neural networks can detect bites and gesture intake (accuracy: 91%, 86%). The active appearance model can detect chewing (accuracy: 93%), and optical flow can count chews (accuracy: 88%). Video fluoroscopy can track swallows but is currently not suitable beyond clinical settings. The optimal method for automated counts of bites and chews is facial landmarks, although further improvements are required. Future methods should accurately predict bites, chews, and swallows using inexpensive hardware and limited computational capacity. Automatic eating behavior analysis will allow the study of eating behavior and real-time interventions to promote healthy eating behaviors.