Can Berker Cikis scite author profile

Can Berker Cikis

3Publications

0Citation Statements Received

20Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Rumble

et al. 2020

View full text Add to dashboard Cite

This paper introduces Rumble, a query execution engine for large, heterogeneous, and nested collections of JSON objects built on top of Apache Spark. While data sets of this type are more and more wide-spread, most existing tools are built around a tabular data model, creating an impedance mismatch for both the engine and the query interface. In contrast, Rumble uses JSONiq, a standardized language specifically designed for querying JSON documents. The key challenge in the design and implementation of Rumble is mapping the recursive structure of JSON documents and JSONiq queries onto Spark's execution primitives based on tabular data frames. Our solution is to translate a JSONiq expression into a tree of iterators that dynamically switch between local and distributed execution modes depending on the nesting level. By overcoming the impedance mismatch in the engine , Rumble frees the user from solving the same problem for every single query, thus increasing their productivity considerably. As we show in extensive experiments, Rumble is able to scale to large and complex data sets in the terabyte range with a similar or better performance than other engines. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.

show abstract

RumbleML: program the lakehouse with JSONiq

Fourny¹,

Dao²,

Cikis³

et al. 2021

Preprint

View full text Add to dashboard Cite

Lakehouse systems have reached in the past few years unprecedented size and heterogeneity and have been embraced by many industry players. However, they are often difficult to use as they lack the declarative language and optimization possibilities of relational engines. This paper introduces RumbleML, a high-level, declarative library integrated into the RumbleDB engine and with the JSONiq language. RumbleML allows using a single platform for data cleaning, data preparation, training, and inference, as well as management of models and results. It does it using a purely declarative language (JSONiq) for all these tasks and without any performance loss over existing platforms (e.g. Spark). The key insights of the design of RumbleML are that training sets, evaluation sets, and test sets can be represented as homogeneous sequences of flat objects; that models can be seamlessly embodied in function items mapping input test sets into prediction-augmented result sets; and that estimators can be seamlessly embodied in function items mapping input training sets to models. We argue that this makes JSONiq a viable and seamless programming language for data lakehouses across all their features, whether database-related or machine-learning-related. While lakehouses bring Machine Learning and Data Wrangling on the same platform, RumbleML also brings them to the same language, JSONiq. In the paper, we present the first prototype and compare its performance to Spark showing the benefit of a huge functionality and productivity gain for cleaning up, normalizing, validating data, feeding it into Machine Learning pipelines, and analyzing the output, all within the same system and language and at scale.

show abstract

Machine Learning with JSONiq

Cikis¹

2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Can Berker Cikis

Rumble

RumbleML: program the lakehouse with JSONiq

Machine Learning with JSONiq

Contact Info

Product

Resources

About