Howard J. Hamilton scite author profile

Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.

show abstract

A Foundational Approach to Mining Itemset Utilities from Databases

Yao¹,

Hamilton²,

Butz

2004

468

279

View full text Add to dashboard Cite

show abstract

Mining itemset utilities from transaction databases

Yao

Hamilton

2006

Data & Knowledge Engineering

346

183

View full text Add to dashboard Cite

Knowledge Discovery and Measures of Interest

Hilderman¹,

Hamilton²

2001

159

View full text Add to dashboard Cite

Library ofCongress Cataloging-in-Publication DataHilderman, Robert 1. Knowledge discovery and measures of interestlby Robert 1. Hilderman, Howard 1. Hamilton. p. cm. -(The Kluwer international series in engineering and computer science;SECS 638) Includes bibliographical references and index. ISBN 978-1-4419-4913-4 ISBN 978-1-4757-3283-2 (eBook)Data mining algorithms can be broadly classified into two general areas: summarization and anomaly detection [71]. Summarization algorithms find concise descriptions of input data. For example, classificatory algorithms partition input data into disjoint groups. The results of such classification might be represented as a high-level summary, a decision tree, or a set of characteristic rules, as with C4.5 [112], DBLearn [58], and KID3 [110]. Anomaly-detection algorithms identify unusual features of data, such as combinations that occur with greater or lesser frequency than might be expected. For example, association algorithms find, from transaction records, sets of items that appear with 4 East 11 $275.00 3 A summary generated from the cross-product domain for the compound attribute Shape-Size-Colour corresponds to a unique combination of nodes from the DGGs associated with the individual attributes, where one node is selected from the DGG associated with each attribute. For example, given the sales transaction database shown in Table 1.1 (assume the Shape, Size, and Colour attributes have been selected for generalization) and the associated DGGs shown in Figure 1.3, one of the many possible summaries that can be generated is shown in Table 1.5. The summary in Table 1.5 is obtained by generalizing the Shape attribute to the ANY node and the Size attribute to the Package node, while the Colour attribute remains ungeneralized.The complexity of the DGGs is a primary factor determining the number of summaries that can be generated, and depends only upon the number of KNOWLEDGE DISCOVERY AND MEASURES OF INTERESTsatisfying X -+ Y, and I X II Y I / N is the number of tuples expected if X and Y were independent (Le., not associated).When RI = 0, then X and Y are statistically independent and the rule is not interesting. When RI > 0 (RI < 0), then X is positively (negatively) correlated to Y. The significance of the correlation between X and Y can be determined using the chi-square test for a 2 x 2 contingency table. Those rules which do not exceed a predetermined minimum significance threshold are determined to be the most interesting.

show abstract

Mining functional dependencies from data

Yao

Hamilton

2007

Data Min Knowl Disc

View full text Add to dashboard Cite

In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data. By exploiting Armstrong's Axioms for functional dependencies, we identify equivalences among attributes, which can be used to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is, we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments. These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.