A significant number of batch process monitoring methods have been proposed since the first groundbreaking approaches were published in the literature, two decades ago. The proper assessment of all the alternatives currently available requires a rigorous and robust assessment framework, in order to assist practitioners in their difficult task of selecting the most adequate approach for the particular situations they face and in the definition of all the optional aspects required, such as the type of preprocessing, infilling, alignment, etc. However, comparison methods currently adopted present several limitations and even some flaws that make the variety of studies available not easily generalizable or, in extreme situations, fundamentally wrong. Without a proper comparison tool decisions are made on a subjective basis and therefore are prone to be at least suboptimal. In this article we present a structured review of comparison methods and figures of merit adopted to assess batch process monitoring approaches, analyzing their strengths and limitations, as well as some common missuses. Furthermore, we propose a comparison and assessment methodology (CAM) and figures of merit that should be adopted in order to circumvent some of the limitations of the current procedures. We begin by addressing, in this article, the analysis of the methods' "detection strength". Detection strength regards the ability to correctly detect abnormal situations without resulting in excessive false alarms. We describe in detail how the comparison framework is implemented and illustrate its application in two case studies, encompassing a rich variety of testing scenarios (99 000 batch runs) and monitoring methods (2-way, 3-way, and dynamic methods, amounting to a total of 60 different combinations of methods and their variants). From this analysis, it stands out that the CAM methodology is suitable for comparing different monitoring methods, even when the number of methods and variants is very large. It provides simple, informative, and statisticalbased metrics to assess a given method's performance and is able to quantify the added value of alternative approaches. When applied to the two case studies considered, 2-way methods (with batch-wise unfolding) combined with missing data (MD) infilling and dynamic time warping (DTW) were found to provide adequate solutions. If a more parsimonious model is required, dynamic methods such as autoregressive principal components analysis (ARPCA) provide good alternatives and Tucker3 with current deviations (CD) infilling also presents an interesting performance for some fault scenarios. In general, no method can claim absolute supremacy in all faulty scenarios.