Applications that need to process large volumes of data to make informed decisions, such as stream processing systems or those dealing with social networks, usually impose strong requirements on resources like memory, processing time, disk or network latency. One approach to address this problem is to reduce the amount of data to be processed, at the cost of decreasing the confidence on the results. Estimating the errors of such approximate solutions becomes a critical issue. In this paper, we explore different approximation possibilities depending on how the data is organized, and what information needs to be obtained from them. We propose an approach to estimate the accuracy of these approximate solutions in the context of data processing systems, in terms of the precision and recall of the results obtained. A case study is used to validate the proposal and to evaluate the performance of different types of approximations.