2021
DOI: 10.14778/3489496.3489498
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating query languages and systems for high-energy physics data

Abstract: In the domain of high-energy physics (HEP), query languages in general and SQL in particular have found limited acceptance. This is surprising since HEP data analysis matches the SQL model well: the data is fully structured and queried using mostly standard operators. To gain insights on why this is the case, we perform a comprehensive analysis of six diverse, general-purpose data processing platforms using an HEP benchmark. The result of the evaluation is an interesting and rather complex picture of existing … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(28 citation statements)
references
References 47 publications
0
28
0
Order By: Relevance
“…Alternative data formats continue to be used in HEP, although it is unclear whether their adoption is increasing. As analyses are probably the most agile part of HEP's software environment, they are expected to continue to try alternatives (HDF5, Parquet, etc), also in function of the tools and libraries used by analyses [6]. ROOT's goal is to preempt such re-formatting, which has consistently proven as a bottleneck for analyses' agility, preventing smooth integration of optimizations in the ROOT-part of the analyses, and increasing the storage needs for analyses.…”
Section: Data Formatmentioning
confidence: 99%
“…Alternative data formats continue to be used in HEP, although it is unclear whether their adoption is increasing. As analyses are probably the most agile part of HEP's software environment, they are expected to continue to try alternatives (HDF5, Parquet, etc), also in function of the tools and libraries used by analyses [6]. ROOT's goal is to preempt such re-formatting, which has consistently proven as a bottleneck for analyses' agility, preventing smooth integration of optimizations in the ROOT-part of the analyses, and increasing the storage needs for analyses.…”
Section: Data Formatmentioning
confidence: 99%
“…Although many tools exist to address data analysis needs of industries and academia, it should not be taken for granted that they can all work just as well in any other field. Particularly, it has been shown that for the HEP data analysis requirements a tailor-made tool like ROOT with its RDataFrame data analysis interface still has a major advantage over other industry frameworks [24]. The HEP field is not new to the investigation of large-scale distributed execution engines.…”
Section: Related Workmentioning
confidence: 99%
“…In other systems, this requires different tools [33] [12], because trying to do all of this in the same program is very cumbersome due to the many impedance mismatches. A benchmark [17] was also performed to show the limitations and shortcomings of other query languages and APIs, with a focus on nestedness and a use case in high-energy physics.…”
Section: Data Preparation With Jsoniqmentioning
confidence: 99%