2020
DOI: 10.1051/epjconf/202024505023
|View full text |Cite
|
Sign up to set email alerts
|

Awkward Arrays in Python, C++, and Numba

Abstract: The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array’s first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance is the separation of kernel functions from data structure management, which allows a C++ implementation and a Numba … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
19
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 18 publications
(22 citation statements)
references
References 6 publications
3
19
0
Order By: Relevance
“…This confirms the findings shown in Ref. [5]. Notably, the loading time is always lower than reading with ROOT's TTree::Draw function.…”
Section: I/o and Storage Formatssupporting
confidence: 91%
See 1 more Smart Citation
“…This confirms the findings shown in Ref. [5]. Notably, the loading time is always lower than reading with ROOT's TTree::Draw function.…”
Section: I/o and Storage Formatssupporting
confidence: 91%
“…Many of them are collected under the umbrella of the Scikit-HEP [3] project. Currently, the Coffea framework [4], together with the Awkward Array package [5], provide the most complete set of tools for columnar data analysis in HEP.…”
Section: Introductionmentioning
confidence: 99%
“…Previously, we quantified this slow-down [5] using ROOT TTrees containing float, std::vector<float>, and vectors of vectors up to three levels deep, reading them with the then-current Python codebase and with custom C++ code, which represents what is possible now that Awkward Array is implemented in C++. The read performance of float data is identical for Python and C++, the C++ is several times faster for std::vector<float>, and the gap widens to factors of hundreds for the doubly-nested and triply-nested cases.…”
Section: Motivationmentioning
confidence: 99%
“…Even the TypedArrayBuilder use-case works by "wiring" its fixed suite of commands to algorithmically generated Forth subroutines. Some TypedArrayBuilder commands must change the state of its finite-state machine: for instance, when filling an array of doubly nested lists of integers like [ [1,2], [3]], [], [[4], [5]], the first begin_list ([) puts it into a state that expects another begin_list ([) or end_list (]); the second puts it into a state that expects integer or end_list (]). And yet, each of these commands must return control-flow to its caller and remember its state for the next call.…”
Section: Awkwardforth Virtual Machinementioning
confidence: 99%
“…Many scientists are taking advantage of Jupyter notebooks [6], which provide a straightforward way of documenting a data analysis. Column-wise data analysis [7], in which a single operation on a vector of events replaces calculations on individual events serially, is seen as a way to for the field to take advantage of vector processing units in modern CPUs, leading to significant speed-ups in throughput. Also, declarative programming paradigms [8] can make it simpler for physicists to intuitively code their analysis.…”
Section: Introductionmentioning
confidence: 99%