“…Model Execution with Memory Budget. To accommodate tight memory budgets, two popular solutions are explored in the literature: model compression [10], [11], [12], [13], [14], [15], [16], [52] and offloading [17], [18], [19], [20], [21], [22]. Model compression techniques reduce the model size by removing redundant parameters such as layers, filters and channels [10], lowering the parameter precision [15], searching efficient model architectures [16], etc.…”