Monte Carlo (MC) simulation is commonly considered as the most accurate dose calculation method for proton therapy. Aiming at achieving fast MC dose calculations for clinical applications, we have previously developed a GPU-based MC tool, gPMC. In this paper, we report our recent updates on gPMC in terms of its accuracy, portability, and functionality, as well as comprehensive tests on this tool. The new version, gPMC v2.0, was developed under the OpenCL environment to enable portability across different computational platforms. Physics models of nuclear interactions were refined to improve calculation accuracy. Scoring functions of gPMC were expanded to enable tallying particle fluence, dose deposited by different particle types, and dose-averaged linear energy transfer (LETd). A multiple counter approach was employed to improve efficiency by reducing frequency of memory writing conflict at scoring. For dose calculation, accuracy improvements over gPMC v1.0 were observed in both water phantom cases and a patient case. For a prostate cancer case planned using high-energy proton beams, dose discrepancies in beam entrance and target region seen in gPMC v1.0 with respect to the gold standard tool for proton Monte Carlo simulations (TOPAS) results were substantially reduced and gamma test passing rate (1%/1mm) was improved from 82.7% to 93.1%. Average relative difference in LETd between gPMC and TOPAS was 1.7%. Average relative differences in dose deposited by primary, secondary, and other heavier particles were within 2.3%, 0.4%, and 0.2%. Depending on source proton energy and phantom complexity, it took 8 to 17 seconds on an AMD Radeon R9 290x GPU to simulate 107 source protons, achieving less than 1% average statistical uncertainty. As beam size was reduced from 10×10 cm2 to 1×1 cm2, time on scoring was only increased by 4.8% with eight counters, in contrast to a 40% increase using only one counter. With the OpenCL environment, the portability of gPMC v2.0 was enhanced. It was successfully executed on different CPUs and GPUs and its performance on different devices varied depending on processing power and hardware structure.