Practical Hadoop Ecosystem 2016
DOI: 10.1007/978-1-4842-2199-0_8
|View full text |Cite
|
Sign up to set email alerts
|

Apache Parquet

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
38
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 68 publications
(39 citation statements)
references
References 0 publications
0
38
0
1
Order By: Relevance
“…This was recognized as early as 1989 when column-wise ntuples were added to PAW and in 1997 when "splitting" was incorporated in the ROOT file format [1]. In the past decade, with the Google Dremel paper [2], the Parquet file format [3], the Arrow memory interchange format [4], and the inclusion of "ragged tensors" in TensorFlow [5], the significance of hierarchical columnar data structures has been recognized beyond particle physics.…”
Section: Introductionmentioning
confidence: 99%
“…This was recognized as early as 1989 when column-wise ntuples were added to PAW and in 1997 when "splitting" was incorporated in the ROOT file format [1]. In the past decade, with the Google Dremel paper [2], the Parquet file format [3], the Arrow memory interchange format [4], and the inclusion of "ragged tensors" in TensorFlow [5], the significance of hierarchical columnar data structures has been recognized beyond particle physics.…”
Section: Introductionmentioning
confidence: 99%
“…Wildfire [103] is an IBM Research project which produces an HTAP engine where both analytical and ongoing requests go through the same columnar data format., i.e. Parquet [104] (non-proprietary storage format), open to any reader for all data. The Spark ecosystem is also used by Wildfire to allow distributed analytics on a large scale.…”
Section: Htapmentioning
confidence: 99%
“…Table 6 displays the cluster system configuration, using Linux as its operating system. This study's Spark file format is a Parquet [24] format designed to improve performance effectively than the text format (default). The total percentage of the cluster resource was fixed at 70%, the Spark driver's memory is at 2 GB, and the executor's memory is set at 50% of resources.…”
Section: The Experimentsmentioning
confidence: 99%