A noisy dataset can contain contradictory data. Contradictory data is synonymous to incorrect data and it is important that such data be investigated and evaluated when analysing a noisy dataset. Different approaches to dealing with contradictory data have been proposed by different researchers. For example [1, 2] proposed methods for identifying and removing contradictory data in noisy datasets. However, the removal of contradictory data from a noisy dataset will increase the incompleteness in the dataset thereby reducing the soundness of any information from such set of data. It is therefore important to identify and evaluate contradictory instances when analysing a large and noisy dataset. This will improve the soundness of the analysis from such a dataset. Evidently, the analysis of big data is identified as the next frontier for innovation and advancement of technology [3, 4]. There is therefore the need to identify appropriate approaches to dealing with contradictions in a large and noisy dataset. There are different forms of contradictions. For example, there are contradictions from the use of modal words, structural, subtle lexical contrasts, as well as world knowledge
The integration of data from different data sources can result to the existence of inconsistent or incomplete data (IID). IID can undermine the validity of information retrieved from an integrated dataset. There is therefore a need to identify these anomalies. This work presents SPARQL queries that retrieve from an EMAGE dataset, information which are inconsistent or incomplete. Also, it will be shown how Formal Concept Analysis (FCA) tools notably FcaBedrock and Concept Explorer can be applied to identify and visualise IID existing in these retrieved information. Although, instances of IID can exist in most data formats, the investigation is focused on RDF dataset.
Visual analysis has witnessed a growing acceptance as a method of scientific inquiry in the research community. It is used in qualitative and mixed research methods. Even so, visual data analysis is likely to produce biased results when used in analysing a large and noisy dataset. This can be evident when a data analyst is not able to holistically explore, all the values associated with the objects of interest in a dataset. Consequently, the data analyst may assess inconsistent data as consistent when contradiction associated with the data is not visualised. This work identifies incomplete analysis as a challenge in the visual data analysis of a large and noisy dataset. It considers Formal Concept Analysis (FCA) tools and techniques and prescribes the mining and visualisation of Incomplete or Inconsistent Data (IID) when dealing with a large and noisy dataset. It presents an automated approach for transforming IID from a noisy context whose objects are associated with mutually exclusive many-valued attributes, to a formal context.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.