Data Quality assessment is aimed at evaluating the suitability of a dataset for an intended task. The extensive literature on data quality describes the various methodologies for assessing data quality by means of data profiling techniques of the whole datasets. Our investigations are aimed to provide solutions to the need of automatically assessing the level of quality of the records of a dataset, where data profiling tools do not provide an adequate level of information. As most of the times, it is easier to describe when a record has quality enough than calculating a qualitative indicator, we propose a semi-automatically business rule-guided data quality assessment methodology for every record. This involves first listing the business rules that describe the data (data requirements), then those describing how to produce measures (business rules for data quality measurements), and finally, those defining how to assess the level of data quality of a data set (business rules for data quality assessment). The main contribution of this paper is the adoption of the OMG standard DMN (Decision Model and Notation) to support the data quality requirement description and their automatic assessment by using the existing DMN engines.
Current Internet of Things (IoT) scenarios have to deal with many challenges especially when a large amount of heterogeneous data sources are integrated, that is, data curation. In this respect, the use of poor‐quality data (i.e., data with problems) can produce terrible consequence from incorrect decision‐making to damaging the performance in the operations. Therefore, using data with an acceptable level of usability has become essential to achieve success. In this article, we propose an IoT‐big data pipeline architecture that enables data acquisition and data curation in any IoT context. We have customized the pipeline by including the DMN4DQ approach to enable us the measuring and evaluating data quality in the data produced by IoT sensors. Further, we have chosen a real dataset from sensors in an agricultural IoT context and we have defined a decision model to enable us the automatic measuring and assessing of the data quality with regard to the usability of the data in the context.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.