Due to copyright restrictions, the access to the full text of this article is only available via subscription.Due to prevalent use of sensors and network monitoring tools, big volumes of data or “big data” today traverse the enterprise data processing pipelines in a streaming fashion. While some companies prefer to deploy their data processing infrastructures and services as private clouds, others completely outsource these services to public clouds. In either case, attempting to store the data first for subsequent analysis creates additional resource costs and unwanted delays in obtaining actionable information. As a result, enterprises increasingly employ data or event stream processing systems and further want to extend them with complex online analytic and mining capabilities. In this paper, we present implementation details for doing both correlation analysis and association rule mining (ARM) over streams. Specifically, we implement Pearson-Product Moment Correlation for analytics and Apriori & FPGrowth algorithms for stream mining inside a popular event stream processing engine called Esper. As a unique contribution, we conduct experiments and present performance results of these new tools with different tumbling and sliding time-windows over two different stream types: one for moving bus trajectories and another for web logs from a music site. We find that while tumbling windows may be more preferable for performance in certain applications, sliding windows can provide additional benefits with rule mining. We hope that our findings can shed light on the design of other cloud analytics systems.Avea Labs ; TÜBİTAK ; European Commission ; IBM Shared University Research Progra
Word embedding approaches represent data sequences to handle their contextual meaning in the NLP tasks. Nowadays, there is an emerging need to understand the user behavior patterns over navigational clickstream data. However, representing the URL data sequences utilizing existing embedding approaches to cluster users' behavior with unsupervised machine learning tasks is a challenging task. This study introduces the Patter2Vec embedding approach using a representation vector to construct contextual, precise, and interpretable clusters over the hidden and popular navigational patterns. To test the usability of the proposed representation in clustering tasks, we conduct an experimental study, which indicates that Pattern2Vec outperforms existing embedding approaches.
Due to copyright restrictions, the access to the full text of this article is only available via subscription.Günümüzde bilişim dünyası faydalı bilgiye ulaşma yolunda “büyük veri” problemleri (verinin kütlesi, hızı, çeşitliliği, tutarsızlığı) ile baş etmeye çalışmaktadır. Bu makalede, büyük veri akışları üzerinde İlişkisel Kural Madenciliği’nin (İKM) daha önce literatürde yapılmamış bir şekilde “çevrimiçi” olarak gerçeklenme detayları ile başarım bulguları paylaşılacaktır. Akış madenciliği için Apriori ile FP-Growth algoritmaları Esper isimli olay akış motoruna eklenmiştir. Elde edilen sistem üzerinde bu iki algoritma kayan penceler ve LastFM sosyal müzik sitesi verileri kullanılarak karşılaştırılmıştır. Başarımı yüksek olan FPGrowth seçilerek gerçek-zamanlı ve kural-tabanlı bir tavsiye motoru oluşturulması sağlanmıştır. En önemli bulgularımız çevrimiçi kural çıkarımı sayesinde: (1) çevrimdışı kural çıkarımından çok daha fazla kuralın (2) çok daha hızlı ve etkin olarak ve (3) çok daha önceden hesaplanabileceği gösterilmiştir. Ayrıca müzik zevklerine uygun “George Harrison⇒The Beatles” gibi pekçok ilginç ve gerçekçi kural bulunmuştur. Sonuçlarımızın ileride diğer büyük veri analitik sistemlerinin tasarım ve gerçeklemesine ışık tutacağını ummaktayız.TÜBİTAK ; European Commissio
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.