In this paper, we present the development of a new version of the BrkgaCuda, called BrkgaCuda 2.0, to support the design and execution of Biased Random-Key Genetic Algorithms (BRKGA) on CUDA/GPU-enabled computing platforms, employing new techniques to accelerate the execution. We compare the performance of our implementation against the standard CPU implementation called BrkgaAPI, developed by Toso and Resende (2015), and the recently proposed GPU-BRKGA, developed by Alves et al (2021). In the same spirit of the standard implementation, all central aspects of the BRKGA logic are dealt with our framework, and little effort is required to reuse the framework on another problem. The user is also allowed to choose to implement the decoder on the CPU in C++ or on GPU in CUDA. Moreover, the BrkgaCuda provides a decoder that receives a permutation created by sorting the indices of the chromosomes using the genes as keys. To evaluate our framework, we use a total of 54 instances of the Traveling Salesman Problem (TSP), the Set Cover Problem (SCP), and the Capacitated Vehicle Routing Problem (CVRP), using a greedy and an optimal decoder on the CVRP. We show that our framework is faster than the standard BrkgaAPI and the GPU-BRKGA while keeping the same solution quality. Also, when using the bb-segsort to create the permutations, our framework achieves even higher speedups when compared to the others.