Craig Chasseur scite author profile

Craig Chasseur

3Publications

30Citation Statements Received

121Citation Statements Given

How they've been cited

How they cite others

112

119

Affiliations

Google (United States), University of Wisconsin–Madison

Publications

Order By: Most citations

Design and evaluation of storage organizations for read-optimized main memory databases

Chasseur¹,

Patel²

2013

Proc. VLDB Endow.

View full text Add to dashboard Cite

Existing main memory data processing systems employ a variety of storage organizations and make a number of storagerelated design choices. The focus of this paper is on systematically evaluating a number of these key storage design choices for main memory analytical (i.e. read-optimized) database settings. Our evaluation produces a number of key insights: First, it is always beneficial to organize data into self-contained memory blocks rather than large files. Second, both column-stores and row-stores display performance advantages for different types of queries, and for high performance both should be implemented as options for the tuple-storage layout. Third, cache-sensitive B+-tree indices can play a major role in accelerating query performance, especially when used in a block-oriented organization. Finally, compression can also play a role in accelerating query performance depending on data distribution and query selectivity.

show abstract

F1 query

et al. 2018

View full text Add to dashboard Cite

F1 Query is a stand-alone, federated query processing platform that executes SQL queries against data stored in different filebased formats as well as different storage systems at Google (e.g., Bigtable, Spanner, Google Spreadsheets, etc.). F1 Query eliminates the need to maintain the traditional distinction between different types of data processing workloads by simultaneously supporting: (i) OLTP-style point queries that affect only a few records; (ii) low-latency OLAP querying of large amounts of data; and (iii) large ETL pipelines. F1 Query has also significantly reduced the need for developing hard-coded data processing pipelines by enabling declarative queries integrated with custom business logic. F1 Query satisfies key requirements that are highly desirable within Google: (i) it provides a unified view over data that is fragmented and distributed over multiple data sources; (ii) it leverages datacenter resources for performant query processing with high throughput and low latency; (iii) it provides high scalability for large data sizes by increasing computational parallelism; and (iv) it is extensible and uses innovative approaches to integrate complex business logic in declarative query processing. This paper presents the end-to-end design of F1 Query. Evolved out of F1, the distributed database originally built to manage Google's advertising data, F1 Query has been in production for multiple years at Google and serves the querying needs of a large number of users and systems.

show abstract

A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew

Chasseur

Patel

2015

View full text Add to dashboard Cite

In-memory data analytic systems that use vertical bit-parallel scan methods generally use encoding techniques. We observe that in such environments, there is an opportunity to turn skew in both the data and predicate distributions (usually a problem for query processing) into a benefit that can be leveraged to encode the column values. This paper proposes a padded encoding scheme to address this opportunity. The proposed scheme creates encodings that map common attribute values to codes that can easily be distinguished from other codes by only examining a few bits in the full code. Consequently, scans on columns stored using the padded encoding scheme can safely prune the computation without examining all the bits in the code, thereby reducing the memory bandwidth and CPU cycles that are consumed when evaluating scan queries. Our padded encoding method results in a fixed-length encoding, as fixed-length encodings are easier to manage. However, the proposed padded encoding may produce longer (fixed-length) codes than those produced by popular order-preserving encoding methods, such as dictionary-based encoding. This additional space overhead has the potential to negate the gains from early pruning of the scan computation. However, as we demonstrate empirically, the additional space overhead is generally small, and the padded encoding scheme provides significant performance improvements.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Craig Chasseur

Design and evaluation of storage organizations for read-optimized main memory databases

F1 query

A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew

Contact Info

Product

Resources

About