2014
DOI: 10.1007/978-3-642-54420-0_31
|View full text |Cite
|
Sign up to set email alerts
|

Transparent Incremental Updates for Genomics Data Analysis Pipelines

Abstract: Abstract.A large up-to-date compendium of integrated genomic data is often required for biological data analysis. The compendium can be tens of terabytes in size, and must often be frequently updated with new experimental or metadata. Manual compendium update is cumbersome, requires a lot of unnecessary computation, and it may result in errors or inconsistencies in the compendium. We propose a transparent file based approach for adding incremental update capabilities to unmodified genomics data analysis tools … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
2
2
1

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 27 publications
0
10
0
Order By: Relevance
“…We achieved up to 82% reduction in analysis time for compendium updates when using GeStore with an unmodified biological data analysis pipeline ( [34] has additional experimental results). We found HBase to be well suited for the data management requirements of GeStore.…”
Section: Gestorementioning
confidence: 99%
See 1 more Smart Citation
“…We achieved up to 82% reduction in analysis time for compendium updates when using GeStore with an unmodified biological data analysis pipeline ( [34] has additional experimental results). We found HBase to be well suited for the data management requirements of GeStore.…”
Section: Gestorementioning
confidence: 99%
“…GeStore [34] is a framework for adding transparent incremental updates to data processing pipelines. We use GeStore to incrementally update large-scale compendia such as the IMP compendia described in the previous section.…”
Section: Gestorementioning
confidence: 99%
“…GeStore [38] is a system for adding transparent incremental updates to biological data processing pipelines. We use GeStore to periodically update large-scale compendia, such as the IMP compendia described in the previous section.…”
Section: B Gestorementioning
confidence: 99%
“…We built GeStore since the processing time for a full compendium update can be several days even on a large computer cluster, making it impractical to frequently update large-scale compendia. We have achieved up to 82% reduction in analysis time for dataset updates when using GeStore with an unmodified biological data analysis pipeline [38]. GeStore also provides efficient meta-database management for large scale meta-databases.…”
Section: B Gestorementioning
confidence: 99%
See 1 more Smart Citation