2017 International Conference on High Performance Computing &Amp; Simulation (HPCS) 2017
DOI: 10.1109/hpcs.2017.19
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Genomic Data Management System on the Cloud

Abstract: Thanks to the huge amount of sequenced data that is becoming available, building scalable solutions for supporting query processing and data analysis over genomics datasets is increasingly important. This paper presents GDMS, a scalable Genomic Data Management System for querying region-based genomic datasets; the focus of the paper is on the deployment of the system on a cluster hosted by CINECA.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…With an exponential growth in datasets, the real challenge will be to store and access haplotyping data in an efficient way, which can potentially be achieved by applying massive parallelism (detailed reviews in [ 141 , 142 ]). In addition, cloud-based strategies will be required for storing, accessing, and sharing data (for example, https://vgp.github.io/genomeark/ ).…”
Section: Remaining Challenges and Perspectivesmentioning
confidence: 99%
“…With an exponential growth in datasets, the real challenge will be to store and access haplotyping data in an efficient way, which can potentially be achieved by applying massive parallelism (detailed reviews in [ 141 , 142 ]). In addition, cloud-based strategies will be required for storing, accessing, and sharing data (for example, https://vgp.github.io/genomeark/ ).…”
Section: Remaining Challenges and Perspectivesmentioning
confidence: 99%
“…With an exponential growth in datasets, the real challenge will be to store and access haplotyping data in an efficient way, which can potentially be achieved by applying massive parallelism (detailed reviews in 130,131 benefit from a public collection of high-quality benchmarks, for example in the form of a community-driven assessment initiative similar to the Critical Assessment of Metagenome Interpretation 83 (CAMI), Assemblathon 133,134 and Genome Assembly Gold-standard Evaluations 135 (GAGE). As the field advances to produce high-quality chromosome-scale phased sequences, the next critical step will be in the development of new gene annotation tools 136 to enable more precise downstream analyses in the coming decade.…”
Section: [H2] Scalementioning
confidence: 99%
“…The BioMart implementation is based upon Structured Query Language (SQL) and supported by several SQL engines, including MySQL, PostgreSQL, Oracle, DB2 and MS SQL. The main difference of Federated GMQL with BioMart is that the former exposes the full power of GMQL, a high-level declarative language allowing the expression of queries over genomic regions and metadata, and is implemented on Spark, which scales better than relational databases on the cloud [25], [26].…”
Section: Federated Genomic Data Management System Comparisonmentioning
confidence: 99%