In this paper, we present an enhancement for Particle Swarm Optimization performance by utilizing CUDA and a Tree Reduction Algorithm. PSO is a widely used metaheuristic algorithm that has been adapted into a CUDA version known as CPSO. The tree reduction algorithm is employed to efficiently compute the global best position. To evaluate our approach, we compared the speedup achieved by our CUDA version against the standard version of PSO, observing a maximum speedup of 37x. Additionally, we identified a linear relationship between the size of swarm particles and execution time; as the number of particles increases, so does computational loadhighlighting the efficiency of parallel implementations in reducing execution time. Our proposed parallel PSOs have demonstrated significant reductions in execution time along with improvements in convergence speed and local optimization performance -particularly beneficial for solving large-scale problems with high computational loads.