Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on Cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on FPGA, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.
In order to reveal the details of the internal flow in a centrifugal pump, a large-scale mesh is needed. However, the mesh generated by the serial grid algorithm cannot meet the calculation requirements due to the huge amount of time. A large-scale parallel mesh generation algorithm of a centrifugal pump for high-performance computers is presented in this paper. First, a grid point set for the 3D Delaunay triangular mesh on the surface of the centrifugal pump is generated. Then, the S-H (Sutherland–Hodgman) algorithm for cropping and segmenting these grid point sets on the surface is employed. A uniform boundary mesh is generated and is divided into different subregions. In addition, in order to ensure the consistency of the interface mesh and to avoid the boundary mesh intersection overlap error, a parallel constrained Delaunay mesh generation algorithm based on region numbering is proposed, which can improve the quality and efficiency of the generated parallel mesh. Finally, the centrifugal pump is tested for verifying the parallel mesh algorithm in the Tianhe-2 supercomputer. PIV (particle image velocimetry) internal flow experiment is comparatively analyzed with the numerical simulation of large-scale mesh. The results show that the algorithm can generate 108 3D unstructured grid elements in 5 minutes, and the parallel efficiency can achieve 80%. The proposed algorithm not only ensures high grid quality with the serial grid algorithm but also accurately simulates the flow law in the centrifugal pump. The double-vortex structure which is obtained by PIV experiment is captured by the large-scale mesh.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.