Arbee L. Chen scite author profile

Arbee L. Chen

5Publications

70Citation Statements Received

110Citation Statements Given

How they've been cited

How they cite others

120

110

Affiliations

National Chengchi University

Publications

Order By: Most citations

A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Wang

Chen

2009

Data Min Knowl Disc

View full text Add to dashboard Cite

In recent times, data are generated as a form of continuous data streams in many applications. Since handling data streams is necessary and discovering knowledge behind data streams can often yield substantial benefits, mining over data streams has become one of the most important issues. Many approaches for mining frequent itemsets over data streams have been proposed. These approaches often consist of two procedures including continuously maintaining synopses for data streams and finding frequent itemsets from the synopses. However, most of the approaches assume that the synopses of data streams can be saved in memory and ignore the fact that the information of the non-frequent itemsets kept in the synopses may cause memory utilization to be significantly degraded. In this paper, we consider compressing the information of all the itemsets into a structure with a fixed size using a hash-based technique. This hash-based approach skillfully summarizes the information of the whole data stream by using a hash table, provides a novel technique to estimate the support counts of the non-frequent itemsets, and keeps only the frequent itemsets for speeding up the mining process. Therefore, the goal of optimizing memory space utilization can be achieved. The correctness guarantee, error analysis, and parameter setting of this approach are presented and a series of experiments is performed to show the effectiveness and the efficiency of this approach.Responsible editor: M.J. Zaki.

show abstract

Finding $$k$$ k most favorite products based on reverse top- $$t$$ t queries

2013

View full text Add to dashboard Cite

On-line rule matching for event prediction

Cho

Yen

et al. 2010

The VLDB Journal

View full text Add to dashboard Cite

Efficient kNN search in polyphonic music databases using a lower bounding mechanism

Liu

Chen

2005

Multimedia Systems

View full text Add to dashboard Cite

Querying polyphonic music from a large data collection is an interesting and challenging topic. Recently, researchers attempt to provide efficient techniques for content-based retrieval in polyphonic music databases where queries can also be polyphonic. However, most of the techniques do not perform the approximate matching well. In this paper, we present a novel method to efficiently retrieve k music works that contain segments most similar to the user query based on the edit distance. A list-based index structure is first constructed using the feature of the polyphony. A set of candidate approximate answers is then generated for the user query. A lower bounding mechanism is proposed to prune these candidates such that the k answers can be obtained efficiently. The efficiency of the proposed method is evaluated by real data set and synthetic data set, reporting significant improvement over existing approaches in the response time yielded.

show abstract

Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Wang

Chen

2010

Data Min Knowl Disc

View full text Add to dashboard Cite

Mining frequent itemsets over data streams has attracted much research attention in recent years. In the past, we had developed a hash-based approach for mining frequent itemsets over a single data stream. In this paper, we extend that approach to mine global frequent itemsets from a collection of data streams distributed at distinct remote sites. To speed up the mining process, we make the first attempt to address a new problem on continuously maintaining a global synopsis for the union of all the distributed streams. The mining results therefore can be yielded on demand by directly processing the maintained global synopsis. Instead of collecting and processing all the data in a central server, which may waste the computation resources of remote sites, distributed computations over the data streams are performed. A distributed computation framework is proposed in this paper, including two communication strategies and one merging operation. These communication strategies are designed according to an accuracy guarantee of the mining results, determining when and what the remote sites should transmit to the central server (named coordinator). On the other hand, the merging operation is exploited to merge the information received from the remote sites into the global synopsis maintained at the coordinator. By the strategies and operation, the goal of continuously maintaining the global synopsis can be achieved. Rooted in the continuously maintained global synopsis, we propose a mining algorithm for finding Responsible editor: M.J. Zaki. 123Mining frequent itemsets 253 global frequent itemsets. Moreover, the correctness guarantees of the communication strategies and merging operation, and the accuracy guarantee analysis of the mining algorithm are provided. Finally, a series of experiments on synthetic datasets and a real dataset are performed to show the effectiveness and efficiency of the distributed computation framework.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Arbee L. Chen

A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Finding $$k$$ k most favorite products based on reverse top- $$t$$ t queries

On-line rule matching for event prediction

Efficient kNN search in polyphonic music databases using a lower bounding mechanism

Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Contact Info

Product

Resources

About