Two different software layers have been proposed to infer efficiently individual DNNs but does not propose to combine them: the inference servers (Triton [8], Ray Serve 1 [9] , Tensorflow Serving [10] and TorchServe [11]) serve the inference systems (such TensorRT [12], OpenVINO [13], ONNX [14] and TFLite [15]) predictions. Our work attempt to fill this gap between the current inference system technologies and the ensembles of deep neural networks.The question we attempt to answer is simple but the solution is challenging "How to systematically allocate an ensemble of DNNs to a given set of devices?". The systematic procedure must be endowed with two main qualities. Firstly, the flexibility, the systematic procedure aims to fit the ensemble in memory to be ready to answer requests, even if the number of devices is lower than the ensemble size. An ideal flexible solution must be able to allocate heterogeneous DNNs (such ResNet, Inception, EfficientNet, ...) on modern clusters containing heterogeneous devices (such CPUs, GPUs, TPUs, ...). Secondly, the efficiency, when an ensemble fits in memory it should optimize the usage of underlying multi-cores devices with minimum overhead due to data transfer.We design the answer in three points.