D ata Quality Information (DQI) is metadata that can be included with data to provide the user with information regarding the quality of that data. As users are increasingly removed from any personal experience with data, knowledge that would be beneficial in judging the appropriateness of the data for the decision to be made has been lost. Data tags could provide this missing information. However, it would be expensive in general to generate and maintain such information. Doing so would be worthwhile only if DQI is used and affects the decision made.This work focuses on how the experience of the decision maker and the available processing time influence the use of DQI in decision making. It also explores other potential issues regarding use of DQI, such as task complexity and demographic characteristics. Our results indicate increasing use of DQI when experience levels progress through the stages from novice to professional. The overall conclusion is that DQI should be made available to managers without domain-specific experience. From this it would follow that DQI should be incorporated into data warehouses used on an ad hoc basis by managers. IntroductionIt has long been recognized that the effectiveness of decision making is influenced by many factors. Among these are the time available before the decision must be rendered, the experience of the decision maker, and the quality of the data needed for the decision. Although ideally the data used should be of high quality, in practice this often is not the case, for reasons that range from the cost of obtaining quality data to the inherent difficulty or even impossibility of doing so for certain data types. Nevertheless, experienced decision makers, especially ones who have worked in a particular milieu for a sufficient period of time, develop a feel for the nuances and eccentricities of the data used and intuitively compensate for them. As organizations increasingly move to stored repositories such as data warehouses, this intuitive feel is not preserved for many who extract data from such sources to support their particular needs.One solution would be to capture some of the knowledge regarding the data's quality along with the actual data values. Data tagging to provide information regarding the data has long been proposed (Wang and Madnick 1990); however, it is not clear how or if decision makers would use this data quality information. Downloaded from informs.org by [131.111.164.128] on 10 August 2015, at 11:09 . For personal use only, all rights reserved. FISHER, CHENGALUR-SMITH, AND BALLOU Data Quality Information in Decision Makingdata quality information (DQI) to be metadata that addresses the data's quality. Clearly, any benefits that accrue from providing information about the quality of the data must outweigh the cost of obtaining and maintaining this metadata. Although logic dictates that DQI would be of benefit, it is also plausible that the benefit of such information would vary considerably depending upon the circumstances. The effect of providing...
Practitioners and researchers regularly refer to error rates or accuracy percentages of databases. The former is the number of cells in error divided by the total number of cells; the latter is the number of correct cells divided by the total number of cells. However, databases may have similar error rates (or accuracy percentages) but differ drastically in the complexity of their accuracy problems. A simple percent does not provide information as to whether the errors are systematic or randomly distributed throughout the database. We expand the accuracy metric to include a randomness measure and include a probability distribution value. The proposed randomness check is based on the Lempel-Ziv (LZ) complexity measure. Through two simulation studies we show that the LZ complexity measure can clearly differentiate as to whether the errors are random or systematic. This determination is a significant first step and is a major departure from the percentage-alone technique. Once it is determined that the errors are random, a probability distribution, Poisson, is used to help address various managerial questions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.