2022
DOI: 10.14778/3551793.3551847
|View full text |Cite
|
Sign up to set email alerts
|

ConnectorX

Abstract: Data is often stored in a database management system (DBMS) but dataframe libraries are widely used among data scientists. An important but challenging problem is how to bridge the gap between databases and dataframes. To solve this problem, we present ConnectorX, a client library that enables fast and memory-efficient data loading from various databases to different dataframes. We first investigate why the loading process is slow and consumes large memory. We surprisingly find that the main overhead comes fro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 22 publications
0
1
0
Order By: Relevance
“…Therefore, for netsDB, the testing datasets were stored natively. For other platforms, the testing datasets, except for Epsilon and Criteo, were stored in tabular format in a PostgreSQL database installed on the same machine, with the database connection accelerated using the state-of-art Connec-torX library [54]. Besides, Epsilon has 2000 features, and Criteo has 1 million features, but PostgreSQL only supports up to 1600 columns [8].…”
Section: Benchmark Workload Descriptionmentioning
confidence: 99%
“…Therefore, for netsDB, the testing datasets were stored natively. For other platforms, the testing datasets, except for Epsilon and Criteo, were stored in tabular format in a PostgreSQL database installed on the same machine, with the database connection accelerated using the state-of-art Connec-torX library [54]. Besides, Epsilon has 2000 features, and Criteo has 1 million features, but PostgreSQL only supports up to 1600 columns [8].…”
Section: Benchmark Workload Descriptionmentioning
confidence: 99%