“…Model parallelism [ 70 , 71 , 72 , 73 , 74 , 75 , 76 , 78 , 80 , 81 , 83 , 84 , 86 , 87 , 89 ] attempts to address both issues by employing partitioning strategies with a finer granularity to produce less expensive DNN subtasks, i.e., partitions with fewer parameters and fewer computation requirements than a layer, and foster more adaptable co-inference schemes. The computations required for a single input are distributed across multiple computing entities, reducing the time needed to process the shared input [ 76 ] but delivering a performance that, as opposed to what has been indicated for pipeline parallelism, is highly dependent on the distribution of such computations across devices.…”