Abstract-This paper presents a new approach named Dynamic Data Cleaning (DDC) aims to improve incomplete dataset consistency by identifying, reconstructing and removing inconsistent data objects for future data analysis process. The proposed DDC approach consists of three methods: Identify Normal Object (INO), Reconstruct Normal Object (RNO) and Dataset Quality Measure (DQM). The first method INO divides the incomplete dataset into normal objects and abnormal objects (outliers) based on degree of missing attributes values in each individual object. Second, the (RNO) method reconstructs missed attributes values in the normal objects by the closest object based on a distance metric and removes inconsistent data objects (outliers) with higher missed data. Finally, the DQM method measures the consistency and inconsistency among the objects in improved dataset with and without outlier. Experimental results show that the proposed DDC approach is suitable to identify and reconstruct the incomplete data objects for improving dataset consistency from lower to higher level without user knowledge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.