2021
DOI: 10.26434/chemrxiv-2021-34p7f
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DBgen: A Python Library for Defining Scalable, Maintainable, Accessible, Reconfigurable, Transparent (SMART) Data Pipelines

Abstract: In this work, we present DBgen, a Python library that provides a framework for defining extract-transform-load (ETL) pipelines to create and populate SQL databases. DBgen is most useful when the underlying data has complex relationships, requires multi-step analysis, is large-scale, and the type of data being collected changes frequently. Scientific data often fits this description. With current tooling, defining ETL pipelines for this particularly difficult- to-manage data is so onerous that a great deal of i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 18 publications
0
1
0
Order By: Relevance
“…The general capabilities of these tools shown in Fig. 7 are to (1) provide a software ecosystem to support improved data and machine-learning predictions, 109,110 (2) facilitate the process of material discovery and synthesis, 111,112 and (3) automate and enhance analysis techniques used to identify active sites, map new phases, and assess catalyst stability. 113 In addition to the ongoing development of the tools themselves, we plan to begin using them specifically to address the durability challenges in ORR outlined in Section 3.…”
Section: New Frontiers: Accelerating Orr Catalysis Research With Arti...mentioning
confidence: 99%
“…The general capabilities of these tools shown in Fig. 7 are to (1) provide a software ecosystem to support improved data and machine-learning predictions, 109,110 (2) facilitate the process of material discovery and synthesis, 111,112 and (3) automate and enhance analysis techniques used to identify active sites, map new phases, and assess catalyst stability. 113 In addition to the ongoing development of the tools themselves, we plan to begin using them specifically to address the durability challenges in ORR outlined in Section 3.…”
Section: New Frontiers: Accelerating Orr Catalysis Research With Arti...mentioning
confidence: 99%