2024
DOI: 10.1109/tvcg.2023.3234337
|View full text |Cite
|
Sign up to set email alerts
|

Tasks and Visualizations Used for Data Profiling: A Survey and Interview Study

Abstract: The use of good-quality data to inform decision making is entirely dependent on robust processes to ensure it is fit for purpose. Such processes vary between organisations, and between those tasked with designing and following them. In this paper we report on a survey of 53 data analysts from many industry sectors, 24 of whom also participated in in-depth interviews, about computational and visual methods for characterizing data and investigating data quality. The paper makes contributions in two key areas. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…On the other hand, a lot of efforts have been recently devoted to increasing data quality with the help of automated pipelines, data engineering frameworks, and prototypes. The implementation of technical solutions like data lakes [105], low-latency data infrastructure [106], feature stores [107], data warehouses [108,109], data branching [110], AutoML for data management [111], data strew-ships [112], data fusion techniques [113], data taxonomies [114], data-quality enhancement pipelines [115], data mesh and fabric [116], addressing imbalances in data [117], smart bots for data quality enhancement [118], data ontologies [119], data quality evaluation metrics [120], synthetic data generation tools [120], data profiling [121], reference stores for data quality [122], and data validation pipelines [123,124], to name a few, are vastly contributing in the feasibility and affordability of DC-AI-based solutions. In the future, more developments are expected in data quality enhancement, leading to the realization of DC-AI across many enterprises.…”
Section: Analysis Of the Feasibility And Affordability Of Dc-ai-based...mentioning
confidence: 99%
“…On the other hand, a lot of efforts have been recently devoted to increasing data quality with the help of automated pipelines, data engineering frameworks, and prototypes. The implementation of technical solutions like data lakes [105], low-latency data infrastructure [106], feature stores [107], data warehouses [108,109], data branching [110], AutoML for data management [111], data strew-ships [112], data fusion techniques [113], data taxonomies [114], data-quality enhancement pipelines [115], data mesh and fabric [116], addressing imbalances in data [117], smart bots for data quality enhancement [118], data ontologies [119], data quality evaluation metrics [120], synthetic data generation tools [120], data profiling [121], reference stores for data quality [122], and data validation pipelines [123,124], to name a few, are vastly contributing in the feasibility and affordability of DC-AI-based solutions. In the future, more developments are expected in data quality enhancement, leading to the realization of DC-AI across many enterprises.…”
Section: Analysis Of the Feasibility And Affordability Of Dc-ai-based...mentioning
confidence: 99%