Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or manythreading platforms is a challenge due to the difficulty of scheduling hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which are running some dependent effective operators of GA. The proposed method can be straightforwardly adapted to run on many-core and multi-core processors by using Compute Unified Device Architecture (CUDA) and Threading Building Blocks (TBB) platforms. To efficiently use the valuable resources of such computing cores in concurrent execution of the GA, threads that run any of the triple kernels are synchronized by a considerably fast switching technique. The offered method was used for parallelizing a GA-based solution of Traveling Salesman Problem (TSP) over CUDA and TBB platforms with identical settings. The results confirm the superiority of the proposed method to state-of-the-art methods in effective parallelization of GAs on Graphics Processing Units (GPUs) as well as on multi-core Central Processing Units (CPUs). Also, for GA problems with a modest initial population, though the switching time among GPU kernels is negligible, the TBB-based parallel GA exploits the resources more efficiently.