Mining itemset utilities from transaction databases

Yao, Hong; Hamilton, Howard J.

doi:10.1016/j.datak.2005.10.004

Cited by 346 publications

(196 citation statements)

References 21 publications

Supporting

Mentioning

186

Contrasting

Unclassified

Order By: Relevance

“…On the other hand, predictive approaches generally cannot ensure that the mining result contains the complete set of high utility itemsets [5,6,34,35]. To address this urgent problem, Li et al proposed the FSM algorithm, a non-exhaustive search method, to discover all SH-frequent itemsets [22].…”

Section: Existing Algorithmsmentioning

confidence: 99%

“…However, such a method is too time-consuming for a large dataset environment. Several heuristic methods have been proposed to accelerate the discovery of high utility (or SH-frequent) itemsets, such as the MEU (UMining_H) [27,28,34,35], SIP, CAC, and IAB [4,6] methods. Nevertheless, these predictive methods may not discover some high utility itemsets.…”

Section: Tidmentioning

confidence: 99%

“…Yao et al [34,35] generalize the share-confidence model [6] to develop the conventional utility mining model. This model can be used to measure the utility of an itemset in terms of net profit, total cost, or time spent [27,28,34,35].…”

Section: Introductionmentioning

confidence: 99%

“…Several other methods have since been proposed to efficiently discover share-frequent (SH-frequent) itemsets with infrequent subsets [4][5][6]17,18,[22][23][24]. Yao et al [34,35] generalize the share-confidence model [6] to develop the conventional utility mining model. This model can be used to measure the utility of an itemset in terms of net profit, total cost, or time spent [27,28,34,35].…”

Section: Introductionmentioning

confidence: 99%

“…Recently, Yao et al [36] attempted to build a unified framework for utilitybased measures [11,27,28,32,[34][35][36] that allows the user to select a suitable utility mining tool for a specific application; however, this framework only employs existing tools. Thus, to effectively discover high utility itemsets, the need for efficient algorithms remains urgent.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Isolated items discarding strategy for discovering high utility itemsets

Yeh

Chang

2008

Data & Knowledge Engineering

248

139

View full text Add to dashboard Cite

Traditional methods of association rule mining consider the appearance of an item in a transaction, whether or not it is purchased, as a binary variable. However, customers may purchase more than one of the same item, and the unit cost may vary among items. Utility mining, a generalized form of the share mining model, attempts to overcome this problem. Since the Apriori pruning strategy cannot identify high utility itemsets, developing an efficient algorithm is crucial for utility mining. This study proposes the Isolated Items Discarding Strategy (IIDS), which can be applied to any existing level-wise utility mining method to reduce candidates and to improve performance. The most efficient known models for share mining are ShFSM and DCG, which also work adequately for utility mining as well. By applying IIDS to ShFSM and DCG, the two methods FUM and DCG+ were implemented, respectively. For both synthetic and real datasets, experimental results reveal that the performance of FUM and DCG+ is more efficient than that of ShFSM and DCG, respectively. Therefore, IIDS is an effective strategy for utility mining.

show abstract

Section: Existing Algorithmsmentioning

confidence: 99%

Section: Tidmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Isolated items discarding strategy for discovering high utility itemsets

Yeh

Chang

2008

Data & Knowledge Engineering

248

139

View full text Add to dashboard Cite

show abstract

Galois Closure Based Association Rule Mining From Biological Data

Mondal¹,

Pasquier²

2013

Biological Knowledge Discovery Handbook

View full text Add to dashboard Cite

Foundations of Imbalanced Learning

Ma²

2013

Imbalanced Learning

View full text Add to dashboard Cite

Many important learning problems, from a wide variety of domains, involve learning from imbalanced data. Because this learning task is quite challenging, there has been a tremendous amount of research on this topic over the past fifteen years. However, much of this research has focused on methods for dealing with imbalanced data, without discussing exactly how or why such methods work-or what underlying issues they address. This is a significant oversight, which this chapter helps to address. This chapter begins by describing what is meant by imbalanced data, and by showing the effects of such data on learning. It then describes the fundamental learning issues that arise when learning from imbalanced data, and categorizes these issues as D R A F T July 9, 2012, 11:10pm D R A F T 2 FOUNDATIONS OF IMBALANCED LEARNING either problem definition level issues, data level issues, or algorithm level issues. The chapter then describes the methods for addressing these issues and organizes these methods using the same three categories. As one example, the data-level issue of "absolute rarity" (i.e., not having sufficient numbers of minority-class examples to properly learn the decision boundaries for the minority class) can best be addressed using a data-level method that acquires additional minority-class training examples. But as we shall see in this chapter, sometimes such a direct solution is not available and less direct methods must be utlized. Common misconceptions are also discussed and explained.Overall, this chapter provides an understanding of the foundations of imbalanced learning by providing a clear description of the relevant issues, and a clear mapping from these issues to the methods that can be used to address them. INTRODUCTIONMany of the machine learning and data mining problems that we study, whether they are in business, science, medicine, or engineering, involve some form of data imbalance. The imbalance is often an integral part of the problem and in virtually every case the less frequently occurring entity is the one that we are most interested in. For example, those working on fraud detection will focus on identifying the fraudulent transactions rather than the more common legitimate transactions [1], a telecommunications engineer will be far more interested in identifying equipment about to fail than equipment that will remain operational [2], and an industrial engineer will be more likely to focus on weld flaws than on welds that are completed satisfactorily [3].In all of these situations it is far more important to accurately predict or identify the rarer case than the more common case, and this is reflected in the costs associated with errors in the predictions and classifications. For example, if we predict that telecommunication equipment is going to fail and it does not, we may incur some modest inconvenience and cost if the equipment is D R A F T July 9, 2012, 11:10pm D R A F T 4 FOUNDATIONS OF IMBALANCED LEARNING 2.2.1 What is an Imbalanced Data Set and what is its Impact on Learning?We be...

show abstract

Mining itemset utilities from transaction databases

Cited by 346 publications

References 21 publications

Isolated items discarding strategy for discovering high utility itemsets

Isolated items discarding strategy for discovering high utility itemsets

Galois Closure Based Association Rule Mining From Biological Data

Foundations of Imbalanced Learning

Contact Info

Product

Resources

About