A Deep Neural Networks ensemble workflow from hyperparameter search to inference leveraging GPU clusters

Pochelu, Pierrick; Petiton, Serge G.; Conche, Bruno

doi:10.1145/3492805.3492819

Cited by 2 publications

(1 citation statement)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…FOS14 and CIF36 have been built with an in-house AutoML [20] with a posthoc ensembling method to automatically build ensembles to maximize the prediction quality. FOS14 and CIF36 are built around the Resnet skeleton from 10 to 132 layers and the number of filters in each convolution is multiplied from 0.5 to 3 compared to the usual ResNet architectures.…”

Section: E the Allocation Optimizermentioning

confidence: 99%

An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Pochelu

Petiton²,

Conche³

2021

2021 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

Two different software layers have been proposed to infer efficiently individual DNNs but does not propose to combine them: the inference servers (Triton [8], Ray Serve 1 [9] , Tensorflow Serving [10] and TorchServe [11]) serve the inference systems (such TensorRT [12], OpenVINO [13], ONNX [14] and TFLite [15]) predictions. Our work attempt to fill this gap between the current inference system technologies and the ensembles of deep neural networks.The question we attempt to answer is simple but the solution is challenging "How to systematically allocate an ensemble of DNNs to a given set of devices?". The systematic procedure must be endowed with two main qualities. Firstly, the flexibility, the systematic procedure aims to fit the ensemble in memory to be ready to answer requests, even if the number of devices is lower than the ensemble size. An ideal flexible solution must be able to allocate heterogeneous DNNs (such ResNet, Inception, EfficientNet, ...) on modern clusters containing heterogeneous devices (such CPUs, GPUs, TPUs, ...). Secondly, the efficiency, when an ensemble fits in memory it should optimize the usage of underlying multi-cores devices with minimum overhead due to data transfer.We design the answer in three points.

show abstract

Section: E the Allocation Optimizermentioning

confidence: 99%