“…Therefore, to benchmark the computing capabilities of the cluster on real-life end-to-end DNN models, we exploit our previous experience on explicit memory management, data tiling techniques [25] and on the deployment of real-sized DNN models on application chips such as Vega [23] to build a model of the system, with larger L2 memory, on which we run the experiments. The hardwareoriented description of the SoC is integrated into our opensource 2 event-based emulator, called GVSOC [33]; to run the experiments, the following measurements and considerations are taken: the GVSOC; as expected, during the execution of the inference task we are never memory-bound; therefore, the contribution of the L2 to L1 (and vice-versa) data movements is relevant only for the total energy consumption; 3) We conduct silicon measurements, in terms of latency and energy, on all the L2 to L1 data transfers (and viceversa) necessary to compute each tile and determined by the GVSOC simulations; we then include the measurements in the model; 4) We conduct silicon measurements, in terms of latency and the energy, on all the kernels necessary to compute each tile generated by the Dory framework; we then include the measurements in the model. The layer-wise compute time and energy of the inference task are shown in Figure 14.…”