Commercial Off-The-Shelf (COTS) System-on-Chip (SoC) are becoming widespread in embedded systems. Many of them include a multicore CPU and a high-end GPU. They combine high computational performance with low power consumption and flexible multilevel parallelism. This kind of device is also being considered for radiation environments where large amounts of data must be processed or compute intensive applications must be executed. In this paper we compare three different strategies to perform matrix multiplication in the GPU of a Tegra TK1 SoC. Our aim is to analyze how the different use of the resources of the GPU influences, not only the computational performance of the algorithm, but also its radiation sensitivity. Radiation experiments with protons were performed to compare the behaviour of the three strategies. Experimental results show that most of the errors force a reboot of the platform. The number of errors is directly related with how the algorithms use the internal memories of the GPU, and increases with the matrix size. It is also related with the number of transactions with the global memory, which in our experiments is not affected by the radiation. Results show that the smallest cross-section is obtained with the fastest algorithm, even if it uses the cores of the GPU more intensively.