“…When N lookup is 40 or 80, the relative EDP of TRiM-B is slightly better than that of TRiM-G. However, considering that TRiM-B incurs 4× more area overhead than TRiM-G as it populates a PE per bank, not a bank group, TRiM-G is a better option compared to TRiM-B in the range of N lookup (between 20 and 80) covered by DLRM [9]. Hereafter, we detail the microarchitecture for TRiM-G. Mitigating load imbalances through replication: At a given N lookup , a memory node with a PE receives fewer embedding vectors to reduce when TRiM exploits finer-grained parallelism, potentially experiencing load imbalance problems.…”