Some applications require fast queries on big tables whose rows are as large as one billion. If we store table data without any compression on disks, the operation of loading the table data into host memory itself costs quite a long time. To speed up queries, we resort to data compression. Table data is first compressed before being saved to disks. In the query stage, compressed data is loaded into host memory, decompressed and then accessed. To speed up decompressing, we make use of the massively parallel capability of GPU devices. In order to make full use of the GPU computing resources, GPU kernels should avoid divergent execution as much as possible, and should make efficient use of the GPU local memory. Guided by these criteria, we have designed a GPU decompression algorithm. Its basic idea is to decompose the decompression task into a few sequential basic operations, and then accomplish each basic operation parallel by using GPU threads. Experiments showed that the average throughput rate of the decompression algorithm implemented with OpenCL could reach to 13.12 GB/s when using AMD RX 6600. The reduced loading time and decompression time significantly improved the query speed in the query stage.