Data Cleaning as an essential phase to enhance the overall quality used for decades with different data models, the majority handled a relational dataset as the most dominant data model. However, the XML data model, besides the relational data model considered the most data model commonly used for storing, retrieving, and querying valuable data. In this paper, we introduce a model for detecting and repairing XML data inconsistencies using a set of conditional dependencies. Detecting inconsistencies will be done by joining the existed data source with a set of patterns tableaus as conditional dependencies and then update these values to match the proper patterns using a set of SQL statements. This research considered the final phase for a cleaning model introduced for XML datasets by firstly mapping the XML document to a set of related tables then discovering a set of conditional dependencies (Functional and Inclusions) and finally then applying the following algorithms as a closing step of quality enhancement.
Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.