Scientific work rests fundamentally upon data, their measurement, processing, analysis, illustration and interpretation. Raw data have to be processed to be ready for statistical analyses and interpretation. While these processing pipelines can be well defined and standardized, they are often characterized by substantial heterogeneity. Here, we present results from a systematic literature search on the different SCR response quantification approaches used in the literature by using fear conditioning research as a case example. Next, we applied seven of the identified approaches (trough-to-peak scoring, script-based baseline-correction, Ledalab as well as four different models implemented in the software PsPM) to two fear conditioning datasets differing in key procedural specifications (i.e., CS duration, reinforcement rate, number of trials). This can be viewed as a set of robustness analyses (i.e., same data subjected to different methods) aiming to investigate if and to what extent these methods yield comparable results. To our knowledge, no formal framework for the evaluation of robustness analyses exists to date, but we may borrow some criteria from a framework suggested for the evaluation of ‘replicability’ in general. Our results from seven different SCR response quantification approaches applied to two datasets suggest that there may be no single approach that consistently yields larger effect sizes across both datasets. Yet, at least some of the approaches employed show consistent effect sizes within each dataset indicating comparability. Finally, we highlight substantial heterogeneity also within most quantification approaches and discuss implications and potential remedies.