Filtering is a required task in surface metrology for the identification of the components relevant for automated quality control. The calculation of real-time features about the surface is crucial to determining the mechanical and physical properties of the inspected product. The computation efficiency of the filtering operations is a major challenge in surface metrology, as current sensors provide massive volumes of data at very high acquisition rates. To overcome the challenges, this work presents different real-time filtering solutions comparing the performance on the CPU and on the GPU, using modern hardware. The proposed framework is focused on filtering techniques that can be expressed using a finite impulse response (FIR) kernel that includes the Gaussian kernel, the most common filtering technique recommended by ISO and ASME standards. This research work proposes variations of the double FIFO and double circular filters. The filters are transformed into a series of general matrix to matrix multiplications, which can be run extremely efficiently on different architectures. The proposed filtering approach provides superior performance compared with previous works. Additionally, tests are carried out to quantify the performance of the GPU in terms of data transfer and computation capabilities in order to diminish the penalty imposed by data transfer from main memory to the GPU in real-time operations. Based on the results, an efficient batch filtering technique is proposed that can be run on the GPU faster than the CPU even for small profile and kernel sizes, offloading this task from the host CPU for optimal system and application response.