2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) 2020
DOI: 10.1109/isca45697.2020.00070
|View full text |Cite
|
Sign up to set email alerts
|

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
86
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 157 publications
(87 citation statements)
references
References 54 publications
1
86
0
Order By: Relevance
“…Such properties render GnR a prime candidate for acceleration using near-data processing (NDP) at the processor-memory interface. Indeed, TensorDIMM [10] and RecNMP [9] are two recent studies that explored the efficacy of NDP in accelerating GnR. However, we observe that the rank-level parallelism exploited by TensorDIMM and RecNMP does not fully reap the maximum potential of NDP acceleration, leaving significant performance capabilities on the table.…”
Section: ! 1 Deep-learning-based Recommendation Systemmentioning
confidence: 78%
See 4 more Smart Citations
“…Such properties render GnR a prime candidate for acceleration using near-data processing (NDP) at the processor-memory interface. Indeed, TensorDIMM [10] and RecNMP [9] are two recent studies that explored the efficacy of NDP in accelerating GnR. However, we observe that the rank-level parallelism exploited by TensorDIMM and RecNMP does not fully reap the maximum potential of NDP acceleration, leaving significant performance capabilities on the table.…”
Section: ! 1 Deep-learning-based Recommendation Systemmentioning
confidence: 78%
“…When N lookup is 40 or 80, the relative EDP of TRiM-B is slightly better than that of TRiM-G. However, considering that TRiM-B incurs 4× more area overhead than TRiM-G as it populates a PE per bank, not a bank group, TRiM-G is a better option compared to TRiM-B in the range of N lookup (between 20 and 80) covered by DLRM [9]. Hereafter, we detail the microarchitecture for TRiM-G. Mitigating load imbalances through replication: At a given N lookup , a memory node with a PE receives fewer embedding vectors to reduce when TRiM exploits finer-grained parallelism, potentially experiencing load imbalance problems.…”
Section: Trim Architecturementioning
confidence: 99%
See 3 more Smart Citations