Proceedings of the 24th ACM International on Conference on Information and Knowledge Management 2015
DOI: 10.1145/2806416.2806538
|View full text |Cite
|
Sign up to set email alerts
|

Semi-Automated Exploration of Data Warehouses

Abstract: Exploratory data analysis tries to discover novel dependencies and unexpected patterns in large databases. Traditionally, this process is manual and hypothesis-driven. However, analysts can come short of patience and imagination. In this paper, we introduce Claude, a hypothesis generator for data warehouses. Claude follows a 2-step approach: (1) It detects interesting views, by exploiting non-linear statistical dependencies between the dimensions and the measure.(2) To explain its findings, it detects local pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 25 publications
0
10
0
Order By: Relevance
“…There are, in general, two ways to perform multidimensional conjunctive selections in column stores. (1) We perform the selection on each column individually, creating an intermediate result per column as (candidate) list of IDs (or as bit-vector). Later, intersecting all lists (or and-ing all bitvectors) to yield the final result.…”
Section: A Adaptive Kd-treementioning
confidence: 99%
See 3 more Smart Citations
“…There are, in general, two ways to perform multidimensional conjunctive selections in column stores. (1) We perform the selection on each column individually, creating an intermediate result per column as (candidate) list of IDs (or as bit-vector). Later, intersecting all lists (or and-ing all bitvectors) to yield the final result.…”
Section: A Adaptive Kd-treementioning
confidence: 99%
“…Their workflow has three main steps. (1) They generate a hypothesis about the data; (2) they validate it on a trial-anderror basis by issuing highly selective queries to check small portions of the data; (3) they adjust their hypothesis based on step (2) and repeat until satisfied [1]. This process can require many time-consuming trial-anderror iterations until the data scientist can extract the desired information.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Data scientists perform exploratory data analysis to discover unexpected patterns in large collections of data. This process is done with a hypothesis-driven trial-and-error approach [26]. They query segments that could potentially provide insights, test their hypothesis, and either zoom in on the same segment or move to a different one depending on the insights gained.…”
Section: Introductionmentioning
confidence: 99%