2020
DOI: 10.48550/arxiv.2009.02353
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Running Neural Networks on the NIC

Abstract: In this paper we show that the data plane of commodity programmable Network Interface Cards (NICs) can run neural network inference tasks required by packet monitoring applications, with low overhead. This is particularly important as the data transfer costs to the host system and dedicated machine learning accelerators, e.g., GPUs, can be more expensive than the processing task itself. We design and implement our system -N3IC -on two different NICs and we show that it can greatly benefit three different netwo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(15 citation statements)
references
References 31 publications
0
15
0
Order By: Relevance
“…As for the complexity is concerned, we observe that system researchers [97]- [100] consider extremely simple models (with just 21 neurons [99] or 50 neurons [100] overall), whereas AI researchers train excessively big models (state of the art models compared in [92] employ in excess of hundredsthousands neurons per-class). Awareness of commercial-grade challenges and constraints helps landing commercial-grade models out of the lab, by explicitly parsimonious AI-model design (less than hundred thousands neurons for all 200 classes [93]) and optimized implementation (e.g., using domain specific accelerator and languages [101], [102]).…”
Section: A Efficiently Handling the Known (L1 To L2)mentioning
confidence: 99%
“…As for the complexity is concerned, we observe that system researchers [97]- [100] consider extremely simple models (with just 21 neurons [99] or 50 neurons [100] overall), whereas AI researchers train excessively big models (state of the art models compared in [92] employ in excess of hundredsthousands neurons per-class). Awareness of commercial-grade challenges and constraints helps landing commercial-grade models out of the lab, by explicitly parsimonious AI-model design (less than hundred thousands neurons for all 200 classes [93]) and optimized implementation (e.g., using domain specific accelerator and languages [101], [102]).…”
Section: A Efficiently Handling the Known (L1 To L2)mentioning
confidence: 99%
“…In comparison, other traffic analytics have less stringent requirements. We operate traffic classification via a 1D-Convolutional Neural Network (CNN) model, which size (about 100 k weights) is smaller than typical 2D CNN models used for image processing, but is significantly larger than the toy-case models used in the related system work [39,40]. The model is equivalent to the one used in [12] trained with over 200 applications labels, which is about ten (four) times the typical (maximum) number of classes considered in the literature [10].…”
Section: Case Studymentioning
confidence: 99%
“…ASIC is used in [40] for DL inference at packet level but only on toy-models with 3 layers and 21 neurons, i.e., 5000× smaller than the model we use. Smart NIC is used in [39], that however limits model size to 50 binary neurons, i.e., 2000× fewer weights, each with a resolution 32× smaller than in our case study. To attain sub-microsecond latency, [39,40] restrict themselves to such tiny models that it becomes questionable if their execution can have any practical use given the significant distance of such shallow models from the depth needed embrace the expected benefits of DL.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A few attempts have been made to run ML models within the network (top of Figure 1), as detailed in Table 4 and §9. The first class of works [40,[44][45][46], implemented binary neural networks on network interface cards (NICs), FPGA or in a software environment. Their attempts to implement on a switch-ASIC have failed both in scale and performance, as it is significantly more constrained in resources and functionality.…”
Section: Introductionmentioning
confidence: 99%