Proceedings of the 26th International Conference on Scientific and Statistical Database Management 2014
DOI: 10.1145/2618243.2618263
|View full text |Cite
|
Sign up to set email alerts
|

Helping scientists reconnect their datasets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 18 publications
0
9
0
Order By: Relevance
“…Helping data scientist to match and explore heterogeneous datasets, even when their scheme is unknown or unfamiliar, is an active and interesting area of research with multiple ramifications [10,14], one of which is schema matching [1]. To the best of our knowledge, there has been no detailed discussions on how this can be achieved on multidimensional spaces when uncertainty is unavoidable.…”
Section: Discussionmentioning
confidence: 99%
“…Helping data scientist to match and explore heterogeneous datasets, even when their scheme is unknown or unfamiliar, is an active and interesting area of research with multiple ramifications [10,14], one of which is schema matching [1]. To the best of our knowledge, there has been no detailed discussions on how this can be achieved on multidimensional spaces when uncertainty is unavoidable.…”
Section: Discussionmentioning
confidence: 99%
“…In [13], Pochampally et al propose to model correlations between different data sources using joint precision (portion of correct outputs over entire outputs) and joint recall (portion of all correct triples that are output by all sources) as indicators. In comparison, the work in [14] relies on history and schema of data sets to map and link them together. In [15], Roy et al use the concept of intervention (i.e, changes in the values of inputs affect the outputs) to look for causal explanation for the answers of SQL queries.…”
Section: Related Workmentioning
confidence: 99%
“…It is often desirable to reconstruct a human-interpretable lineage for such various versions. As demonstrated in a user study from prior work [111], detecting the relationship among datasets can enable users to recall transformations from one dataset version to another, and subsequently help users identify the best dataset for a given task. As revealed in Example 8.1, a real workflow written by some data scientist, feature engineering and data quality play a critical role in the performance of a machine learning task.…”
Section: Additional Related Workmentioning
confidence: 99%
“…ReConnect [111] attempts to discover the relationship for a given dataset pair. It first defines a space of relevant relationships, generates the conditions for each relationship based on row and column statistics, and then suggests a relationship for a given dataset pair by examining the conditions.…”
Section: Additional Related Workmentioning
confidence: 99%