Kayhan Dursun scite author profile

Kayhan Dursun

4Publications

73Citation Statements Received

83Citation Statements Given

How they've been cited

How they cite others

Affiliations

John Brown University, Brown University, Providence College

Publications

Order By: Most citations

An architecture for compiling UDF-centric workflows

et al. 2015

View full text Add to dashboard Cite

Data analytics has recently grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. Users frequently express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of complex analytics is no longer sheer data volume but rather the computation itself, and the next generation of analytics frameworks must focus on optimizing for this computation bottleneck. While query compilation has gained widespread popularity as a way to tackle the computation bottleneck for traditional SQL workloads, relatively little work addresses UDF-centric workflows in the domain of complex analytics. In this paper, we describe a novel architecture for automatically compiling workflows of UDFs. We also propose several optimizations that consider properties of the data, UDFs, and hardware together in order to generate different code on a case-by-case basis. To evaluate our approach, we implemented these techniques in TUPLEWARE, a new high-performance distributed analytics system, and our benchmarks show performance improvements of up to three orders of magnitude compared to alternative systems.

show abstract

Revisiting Reuse in Main Memory Database Systems

Dursun

Binnig

Çetintemel

et al. 2017

View full text Add to dashboard Cite

Reusing intermediates in databases to speed-up analytical query processing has been studied in the past. Existing solutions typically require intermediate results of individual operators to be materialized into temporary tables to be considered for reuse in subsequent queries. However, these approaches are fundamentally ill-suited for use in modern main memory databases. The reason is that modern main memory DBMSs are typically limited by the bandwidth of the memory bus, thus query execution is heavily optimized to keep tuples in the CPU caches and registers. To that end, adding additional materialization operations into a query plan not only add additional traffic to the memory bus but more importantly prevent the important cache-and registerlocality opportunities resulting in high performance penalties.In this paper we study a novel reuse model for intermediates, which caches internal physical data structures materialized during query processing (due to pipeline breakers) and externalizes them so that they become reusable for upcoming operations. We focus on hash tables, the most commonly used internal data structure in main memory databases to perform join and aggregation operations. As queries arrive, our reuse-aware optimizer reasons about the reuse opportunities for hash tables, employing cost models that take into account hash table statistics together with the CPU and data movement costs within the cache hierarchy. Experimental results, based on our HashStash prototype demonstrate performance gains of 2× for typical analytical workloads with no additional overhead for materializing intermediates.

show abstract

Ensemble of Multi-objective Clustering Unified with H-Confidence Metric as Validity Metric

Sert

Dursun

Özyer

2011

View full text Add to dashboard Cite

Multi objective clustering is one focused area of multi objective optimization. Multi objective optimization attracted many researchers in several areas over a decade. Utilizing multi objective clustering mainly considers multiple objectives simultaneously and results with several natural clustering solutions. Obtained result set suggests different point of views for solving the clustering problem. This paper assumes all potential solutions belong to different experts and in overall; ensemble of solutions finally has been utilized for finding the final natural clustering. We have tested on categorical, further on mixed credit card dataset with different objectives, and compared them against single objective clustering result in terms of purity.

show abstract

Systematic investigation of the effects of unidirectional links on the lifetime of wireless sensor networks

Özyer

Tavlı

Dursun

et al. 2013

Computer Standards & Interfaces

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kayhan Dursun

An architecture for compiling UDF-centric workflows

Revisiting Reuse in Main Memory Database Systems

Ensemble of Multi-objective Clustering Unified with H-Confidence Metric as Validity Metric

Systematic investigation of the effects of unidirectional links on the lifetime of wireless sensor networks

Contact Info

Product

Resources

About