Arun Swami scite author profile

We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an e cient algorithm that generates all signi cant association rules between items in the database. The algorithm incorporates bu er management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the e ectiveness of the algorithm.

show abstract

Efficient similarity search in sequence databases

Agrawal

1993

View full text Add to dashboard Cite

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the rst few frequencies are strong. Another important observation is Parseval's theorem, which speci es that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lowerdimensionality space by using only the rst few Fourier coe cients, we use Rtrees to index the sequences and e ciently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coe cients (1-3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

show abstract

Database mining: a performance perspective

Agrawal

Imieliński

Swami

1993

IEEE Trans. Knowl. Data Eng.

1,204

473

View full text Add to dashboard Cite

We present our perspective of database mining as the con uence of machine learning techniques and the performance emphasis of database technology. W e describe three classes of database mining problems involving classi cation, associations, and sequences, and argue that these problems can be uniformly viewed as requiring discovery of rules embedded in massive data. We describe a model and some basic operations for the process of rule discovery. W e show h o w the database mining problems we consider map to this model and how they can be solved by using the basic operations we propose. We give an example of an algorithm for classi cation obtained by combining the basic rule discovery operations. This algorithm not only is e cient in discovering classi cation rules but also has accuracy comparable to ID3, one of the current best classi ers.

show abstract

Set-oriented mining for association rules in relational databases

View full text Add to dashboard Cite

We describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less escient than special-purpose algorithms. W e develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named S E T M emerges as the algorithm of choice. Algorithm S E T M uses only simple database primitives, viz., sorting and merge-scan join. Algorithm S E T M is simple, fast, and stable over the mnge of pammeter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be cam'ed out by using general query languages such as SQL, mther than by developing specialized black box algorithms. The set-oriented nature of Algorithm S E T M facilitates the development of extensions.

show abstract

Clustering association rules

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Arun Swami

Mining association rules between sets of items in large databases

Efficient similarity search in sequence databases

Database mining: a performance perspective

Set-oriented mining for association rules in relational databases

Clustering association rules

Contact Info

Product

Resources

About