Robert J. Hilderman scite author profile

Robert J. Hilderman

Sign up to set email alerts

|

5Publications

193Citation Statements Received

140Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Regina, Saskatchewan Health

Publications

Order By: Most citations

Knowledge Discovery and Measures of Interest

2001

View full text Add to dashboard Cite

Library ofCongress Cataloging-in-Publication DataHilderman, Robert 1. Knowledge discovery and measures of interestlby Robert 1. Hilderman, Howard 1. Hamilton. p. cm. -(The Kluwer international series in engineering and computer science;SECS 638) Includes bibliographical references and index. ISBN 978-1-4419-4913-4 ISBN 978-1-4757-3283-2 (eBook)Data mining algorithms can be broadly classified into two general areas: summarization and anomaly detection [71]. Summarization algorithms find concise descriptions of input data. For example, classificatory algorithms partition input data into disjoint groups. The results of such classification might be represented as a high-level summary, a decision tree, or a set of characteristic rules, as with C4.5 [112], DBLearn [58], and KID3 [110]. Anomaly-detection algorithms identify unusual features of data, such as combinations that occur with greater or lesser frequency than might be expected. For example, association algorithms find, from transaction records, sets of items that appear with 4 East 11 $275.00 3 A summary generated from the cross-product domain for the compound attribute Shape-Size-Colour corresponds to a unique combination of nodes from the DGGs associated with the individual attributes, where one node is selected from the DGG associated with each attribute. For example, given the sales transaction database shown in Table 1.1 (assume the Shape, Size, and Colour attributes have been selected for generalization) and the associated DGGs shown in Figure 1.3, one of the many possible summaries that can be generated is shown in Table 1.5. The summary in Table 1.5 is obtained by generalizing the Shape attribute to the ANY node and the Size attribute to the Package node, while the Colour attribute remains ungeneralized.The complexity of the DGGs is a primary factor determining the number of summaries that can be generated, and depends only upon the number of KNOWLEDGE DISCOVERY AND MEASURES OF INTERESTsatisfying X -+ Y, and I X II Y I / N is the number of tuples expected if X and Y were independent (Le., not associated).When RI = 0, then X and Y are statistically independent and the rule is not interesting. When RI > 0 (RI < 0), then X is positively (negatively) correlated to Y. The significance of the correlation between X and Y can be determined using the chi-square test for a 2 x 2 contingency table. Those rules which do not exceed a predetermined minimum significance threshold are determined to be the most interesting.

Evaluation of Interestingness Measures for Ranking Discovered Knowledge

¹

,

²

2001

View full text Add to dashboard Cite

Applying Objective Interestingness Measures in Data Mining Systems

¹

,

²

2000

View full text Add to dashboard Cite

One of the most important steps in any knowledge discovery task is the interpretation and evaluation of discovered patterns. To address this problem, various techniques, such as the chi-square test for independence, have been suggested to reduce the number of patterns presented to the user and to focus attention on those that are truly statistically signiaecant. However, when mining a large database, the number of patterns discovered can remain large even after adjusting signiaecance thresholds to eliminate spurious patterns. What is needed, then, is an eaeective measure to further assist in the interpretation and evaluation step that ranks the interestingness of the remaining patterns prior to presenting them to the user. In this paper, we describe a two-step process for ranking the interestingness of discovered patterns that utilizes the chi-square test for independence in the aerst step and objective measures of interestingness in the second step. We show h o w this two-step process can be applied to ranking characterizedègeneralized association rules and data cubes.

Mining market basket data using share measures and characterized itemsets

¹

,

²

,

³

et al. 1998

View full text Add to dashboard Cite

Heuristics for Ranking the Interestingness of Discovered Knowledge

¹

,

²

1999

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.