Apache Parquet

Vohra, Deepak

doi:10.1007/978-1-4842-2199-0_8

Cited by 68 publications

(39 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This was recognized as early as 1989 when column-wise ntuples were added to PAW and in 1997 when "splitting" was incorporated in the ROOT file format [1]. In the past decade, with the Google Dremel paper [2], the Parquet file format [3], the Arrow memory interchange format [4], and the inclusion of "ragged tensors" in TensorFlow [5], the significance of hierarchical columnar data structures has been recognized beyond particle physics.…”

Section: Introductionmentioning

confidence: 99%

Awkward Arrays in Python, C++, and Numba

2020

View full text Add to dashboard Cite

The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array’s first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance is the separation of kernel functions from data structure management, which allows a C++ implementation and a Numba implementation to share kernel functions, and the algorithm that transforms recordoriented data into columnar Awkward Arrays.

show abstract

Section: Introductionmentioning

confidence: 99%

Awkward Arrays in Python, C++, and Numba

2020

View full text Add to dashboard Cite

show abstract

“…Wildfire [103] is an IBM Research project which produces an HTAP engine where both analytical and ongoing requests go through the same columnar data format., i.e. Parquet [104] (non-proprietary storage format), open to any reader for all data. The Spark ecosystem is also used by Wildfire to allow distributed analytics on a large scale.…”

Section: Htapmentioning

confidence: 99%

Growth of relational model: Interdependence and complementary to big data

Shetty

Rao

Prabhu

2021

IJECE

View full text Add to dashboard Cite

A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system.

show abstract

“…Table 6 displays the cluster system configuration, using Linux as its operating system. This study's Spark file format is a Parquet [24] format designed to improve performance effectively than the text format (default). The total percentage of the cluster resource was fixed at 70%, the Spark driver's memory is at 2 GB, and the executor's memory is set at 50% of resources.…”

Section: The Experimentsmentioning

confidence: 99%