2022
DOI: 10.48550/arxiv.2204.06074
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Skyhook: Towards an Arrow-Native Storage System

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…Data processing and exchange can be implemented with a number of building blocks that include the Parquet file format [19], the Flight framework for efficient data interchange between processes [20], the Gandiva LLVM-based JIT computation for executing analytical expressions by leveraging modern CPU SIMD instructions to process Arrow data [21], and Awkward Array for restructuring computation on columnar and nested data [22]. On top of these building blocks exist a number of Arrow integration frameworks, including the Fletcher framework that integrates FPGAs with Apache Arrow [23], NVIDIA's RAPIDS cuDF framework that does similar for GPUs [24], [25], the Plasma high-performance shared-memory object store [26], the Skyhook distributed storage plug-in to embed Arrow processing engines within Ceph storage objects [27], [28], and the Substrait effort to standardize an open format for query plans between query optimizers and processing engines [29]. There are many more projects that are adopting the Apache Arrow in-memory representation and the Dataset Interface that abstracts over a variety of file formats and other data sources [30].…”
Section: A Apache Arrowmentioning
confidence: 99%
“…Data processing and exchange can be implemented with a number of building blocks that include the Parquet file format [19], the Flight framework for efficient data interchange between processes [20], the Gandiva LLVM-based JIT computation for executing analytical expressions by leveraging modern CPU SIMD instructions to process Arrow data [21], and Awkward Array for restructuring computation on columnar and nested data [22]. On top of these building blocks exist a number of Arrow integration frameworks, including the Fletcher framework that integrates FPGAs with Apache Arrow [23], NVIDIA's RAPIDS cuDF framework that does similar for GPUs [24], [25], the Plasma high-performance shared-memory object store [26], the Skyhook distributed storage plug-in to embed Arrow processing engines within Ceph storage objects [27], [28], and the Substrait effort to standardize an open format for query plans between query optimizers and processing engines [29]. There are many more projects that are adopting the Apache Arrow in-memory representation and the Dataset Interface that abstracts over a variety of file formats and other data sources [30].…”
Section: A Apache Arrowmentioning
confidence: 99%