Imad Rahal scite author profile

Association rule mining (ARM) is the datamining process for finding all association rules in datasets matching user-defined measures of interest such as support and confidence. Usually, ARM proceeds by mining all frequent itemsets -a step known to be very computationally intensive -from which rules are then derived in a straight forward manner. In general, mining all frequent itemsets prunes the space by using the downward closure (or antimonotonicity) property of support which states that no itemset can be frequent unless all of its subsets are frequent. A large number of papers have addressed the problem of ARM but not many of them have focused on scalability over very large datasets (i.e. when datasets contain a very large number of transactions). In this paper, we propose a new model for representing data and mining frequent itemsets that is based on the P-tree technology for compression and faster logical operations over vertically structured data and on set enumeration trees for fast itemset enumeration. Experimental results presented hereinafter show big improvements for our approach over large datasets when compared to other contemporary approaches in the literature.

show abstract

A vertical distance-based outlier detection method with local pruning

Ren

Rahal

Perrizo

et al. 2004

View full text Add to dashboard Cite

One person's noise is another person's signal". Outlier detection is used to clean up datasets and also to discover useful anomalies, such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations, etc. Thus, outlier detection is critically important in the information-based society. This paper focuses on finding outliers in large datasets using distance-based methods. First, to speedup outlier detections, we revise Knorr and Ng's distance-based outlier definition; second, a vertical data structure, instead of traditional horizontal structures, is adopted to facilitate efficient outlier detection further. We tested our methods against national hockey league dataset and show an order of magnitude of speed improvement compared to the contemporary distance-based outlier detection approaches.

show abstract

A vertical outlier detection algorithm with clusters as by-product

Ren

Rahal

Perrizo

View full text Add to dashboard Cite

Outlier detection can lead to discovering unexpected and interesting knowledge, which is critically important to some areas such as monitoring of criminal activities in electronic commerce, credit card fraud, and the like. In this paper, we propose an efficient outlier detection method with clusters as by-product, which works efficiently for large datasets. Our contributions are: a) We introduce a Local Connective Factor (LCF); b) Based on LCF, we propose an outlier detection method which can efficiently detect outliers and group data into clusters in a one-time process. Our method does not require the beforehand clustering process, which is the first step in other state-of-the-art clustering-based outlier detection methods; c) The performance of our method is further improved by means of a vertical data representation, Ptrees 1 . We tested our method with real dataset. Our method shows around five-time speed improvements compared to the other contemporary clustering-based outlier-detection approaches.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Imad Rahal

Efficient clustering-based source code plagiarism detection using PIY

An optimized approach for KNN text categorization using P-trees

A Scalable Vertical Model for Mining Association Rules

A vertical distance-based outlier detection method with local pruning

A vertical outlier detection algorithm with clusters as by-product

Contact Info

Product

Resources

About