2017
DOI: 10.1016/j.future.2017.01.016
|View full text |Cite
|
Sign up to set email alerts
|

Raw data queries during data-intensive parallel workflow execution

Abstract: Computer simulations consume and produce huge amounts of raw data files presented in different formats, e.g., HDF5 in computational fluid dynamics simulations. Users often need to analyze domain-specific data based on related data elements from multiple files during the execution of computer simulations. In a raw data analysis, one should identify regions of interest in the data space and retrieve the content of specific related raw data files. Existing solutions, such as FastBit and RAW, are limited to a sing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0
4

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 24 publications
(28 citation statements)
references
References 35 publications
0
24
0
4
Order By: Relevance
“…We use paths that point to these data files on disk in the attributes of the data elements composing the datasets 1 ∈ whenever a file is read or written by an activity [7]. Some other domain-specific data values within those files can be extracted or generated based on the content of the file and represented as attributes in the data elements, also increasing the expressivity of the file descriptions [13,14]. These domain-specific quantities can be decisive for the domain so they need to be tracked.…”
Section: Scientific Domain Data Management Using the Algebraic Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…We use paths that point to these data files on disk in the attributes of the data elements composing the datasets 1 ∈ whenever a file is read or written by an activity [7]. Some other domain-specific data values within those files can be extracted or generated based on the content of the file and represented as attributes in the data elements, also increasing the expressivity of the file descriptions [13,14]. These domain-specific quantities can be decisive for the domain so they need to be tracked.…”
Section: Scientific Domain Data Management Using the Algebraic Approachmentioning
confidence: 99%
“…Data management in scientific workflows is critical due to the inherent complexity of the scientific domain data and the HPC requirements, such as efficient exploitation of data parallelism. In this section, we provide the background for this work, which relies on a data-centric algebraic approach for scientific workflows [13,14]. It provides constructs, mechanisms, and conceptualizations which in essence aim at valorizing fine-grained elements of data flowing throughout the workflow activities, rather than just the chaining of tasks (i.e., chaining of programs or processes).…”
Section: Introductionmentioning
confidence: 99%
“…Many WMSs adopt a database to store lineage data [24]. Other work emphasises queries over the stored data, while these are generated [30]. We present details about storage and access of lineage information in a separate paper.…”
Section: Related Workmentioning
confidence: 99%
“…This section is sub-divided into three complementary areas related to optimising workflows performance: parallel computing [24], scheduling and planning [25,26,27,28], and data management [29,30].…”
Section: Performance and Optimisationmentioning
confidence: 99%