2019 IEEE International Conference on Big Data (Big Data) 2019
DOI: 10.1109/bigdata47090.2019.9006303
|View full text |Cite
|
Sign up to set email alerts
|

AFrame: Extending DataFrames for Large-Scale Modern Data Analysis

Abstract: Analyzing the increasingly large volumes of data that are available today, possibly including the application of custom machine learning models, requires the utilization of distributed frameworks. This can result in serious productivity issues for "normal" data scientists. This paper introduces AFrame, a new scalable data analysis package powered by a Big Data management system that extends the data scientists' familiar DataFrame operations to efficiently operate on managed data at scale. AFrame is implemented… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…The two presented approaches, SDBL via Python and SDB via PostgreSQL also differ with respect to their memory model. When using SDBL via Python, the primary data working object is the pandas dataframe, which albeit its strong resemblance to a relational table in terms of its structure, is nevertheless non-persistent [73]. In other words, it is up to the programmer to choose a suitable data format (such as the feather package which we used) in order to save the dataframe onto disk for later use.…”
Section: Discussionmentioning
confidence: 99%
“…The two presented approaches, SDBL via Python and SDB via PostgreSQL also differ with respect to their memory model. When using SDBL via Python, the primary data working object is the pandas dataframe, which albeit its strong resemblance to a relational table in terms of its structure, is nevertheless non-persistent [73]. In other words, it is up to the programmer to choose a suitable data format (such as the feather package which we used) in order to save the dataframe onto disk for later use.…”
Section: Discussionmentioning
confidence: 99%
“…In the research community, there are multiple notable papers that have tackled dataframe optimization through vastly different approaches. Sinthong et al propose AFrame, a dataframe system implemented on top of AsterixDB by translating dataframe APIs into SQL++ queries that are supported by AsterixDB [15]. Another work by Yan et al aims to accelerate EDA with dataframes by "auto-suggesting" data exploration op-erations [17].…”
Section: Related Workmentioning
confidence: 99%
“…AFrame [24,25] is a library that provides a Pandas DataFrame [19] based syntax to interact with data in Apache AsterixDB. AFrame targets data scientists who are already familiar with Pandas DataFrames.…”
Section: B Aframementioning
confidence: 99%
“…One is the total runtime, which includes both the DataFrame creation time and the expression runtime, and the other is the expression-only runtime. This is done to reflect the impact of the schema inferencing process which can be time-consuming for some DataFrame libraries [24,25]. Also, depending on the nature of a given analysis, the DataFrame creation time can dominate the actual expression evaluation time.…”
Section: A Dataframe Benchmarkmentioning
confidence: 99%
See 1 more Smart Citation