Comprehensive hardware assurance approaches guaranteeing trust on Integrated Circuits (ICs) typically require the verification of the IC design layout and functionality through destructive Reverse Engineering (RE). It is a resource intensive process that will benefit greatly from the extensive integration of data-driven paradigms, especially in the imaging and image analysis phase. Although obvious, this uptake of data-driven approaches into RE-assisted hardware assurance is lagging due to the lack of massive amounts of high-quality labelled data. In this paper, a large-scale synthetic Scanning Electron Microscopy (SEM) dataset, REFICS, is introduced to address this issue. The dataset, the first open-source dataset in the RE community, consists of 800,000 SEM images over two node technologies, 32nm and 90nm, and four cardinal layers of the IC, namely, doping, polysilicon, contact and metal layers. Furthermore, a framework, based on uncertainty and risk, is introduced to compare the efficacy and benefits of existing RE workflows utilizing ad-hoc steps in its execution. These developments are critical in developing RE-assisted hardware assurance into a scalable, automated and fault-tolerant approach. Finally, the work is concluded with the performance analysis of existing machine learning and deep learning approaches for image analysis in RE and hardware assurance.