“…DLRMs have received much attention by Internet giants [2,24,36,57,58,82,117] as they can offer more revenue and better user experience. In an end-to-end recommender system, the most expensive part of serving an inference request is the embedding reduction step, consuming huge memory capacity [117,177,178] and 1 2 ∼ 3 4 of the inference time [38,58,60,82,92].…”