Proceedings of the 2021 International Conference on Management of Data 2021
DOI: 10.1145/3448016.3457330
|View full text |Cite
|
Sign up to set email alerts
|

DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python

Abstract: Exploratory Data Analysis (EDA) is a crucial step in any data science project. However, existing Python libraries fall short in supporting data scientists to complete common EDA tasks for statistical modeling. Their API design is either too low level, which is optimized for plotting rather than EDA, or too high level, which is hard to specify more fine-grained EDA tasks. In response, we propose DataPrep.EDA, a novel task-centric EDA system in Python. Dat-aPrep.EDA allows data scientists to declaratively specif… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(8 citation statements)
references
References 23 publications
0
7
0
1
Order By: Relevance
“…Upon examination of the dataset, insights provided by a Python Data Prep library [64] shows three of the numerical features (namely, BTC, USD, and Netflow Bytes) exhibiting significant skewness in their distributions (Figure 5). A sequence of mathematical adjustments/transformations [65] was applied to these characteristics to address their skewed distributions.…”
Section: A Data Pre-processingmentioning
confidence: 99%
“…Upon examination of the dataset, insights provided by a Python Data Prep library [64] shows three of the numerical features (namely, BTC, USD, and Netflow Bytes) exhibiting significant skewness in their distributions (Figure 5). A sequence of mathematical adjustments/transformations [65] was applied to these characteristics to address their skewed distributions.…”
Section: A Data Pre-processingmentioning
confidence: 99%
“…Preprocessing data: Tahap ini merupakan tahap persiapan data sebelum dilakukan analisis dengan heatmap korelasi dan scatterplot. Preprocessing data meliputi pembersihan data, pengubahan data menjadi bentuk yang sesuai dengan kebutuhan analisis, dan penghapusan data yang tidak valid [7]. 4.…”
Section: Metode Penelitianunclassified
“…Based on feedback from our domain expert interviewees, we model three of these dimensions (completeness, free-of-error, and objectivity) to guide users about potential data quality concerns. While data cleaning (e.g., imputing missing values) [18,20,47,60,72,73,76,88,94,104,125] is deferred to future work, DataPilot currently provides novel GUI interaction affordances to support subset selection.…”
Section: Datamentioning
confidence: 99%