Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains a challenge.Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a dataefficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model.We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable deep learning and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.
Dynamic Partially Reconfiguration (DPR) on FPGA has attracted significant research interest in recent years since it provides benefits such as reduced area and flexible functionality. However, due to the lack of supporting synthesis tools in the current DPR design flow, leveraging benefits from DPR requires specific design expertise with laborious manual design effort. Considering the complicated concurrency relations among various functions, it is challenging to select appropriate Partial Reconfiguration Modules (PR Modules) and cluster them into proper groups with a proper reconfiguration schedule so that the hardware modules can be swapped in and out correctly during the run time. Furthermore, the design of PR Modules also impacts reconfiguration latency and resource utilization greatly. In this paper, we propose a Maximum-Weight Independent Set model to formulate the PR Module selection and clustering problem so that the original manual exploration can be solved efficiently and automatically. We also propose a step-wise adjustment configuration prefetching strategy incorporated in our model to generate optimized reconfiguration schedules. Our proposed approach not only supports various design constraints but also can consider multiple objectives such as area and reconfiguration delay. Experimental results show that our approach can optimize resource utilization and reduce reconfiguration delay with good scalability. Especially, the implementation of the real design case shows that our approach can be embedded in Xilinx's DPR design flow successfully.
Dynamic Partially Reconfiguration (DPR) designs provide additional benefits compared to traditional FPGA application. However, due to the lack of support from automatic design tools in current design flow, designers have to manually define the dimensions and positions of Partially Reconfigurable Regions (PR Regions). The following fine-grained placement for system modules is also limited because it takes the floorplanning result as a rigid region constraint. Therefore, the manual floorplanning is laborious and may lead to inferior fine-grained placement results. In this paper, we propose to integrate PR Region floorplanning with fine-grained placement to achieve the global optimization of the whole DPR system. Effective strategies for tuning PR Region floorplanning and apposite analytical evaluation models are customized for DPR designs to handle the co-optimization for both PR Regions and static region. Not only practical reconfiguration cost and specific reconfiguration constraints for DPR system are considered, but also the congestion estimation can be relaxed by our approach. Especially, we established a two-stage stochastic optimization framework which handles different objectives in different optimization stages so that automated floorplanning and global optimization can be achieved in reasonable time. Experimental results demonstrate that due to the flexibility benefit from the unification of PR Region floorplanning and fine-grained placement, our approach can improve 20.9% on critical path delay, 24% on reconfiguration delay, 12% on congestion, and 8.7% on wire length compared to current DPR design method.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.