2020
DOI: 10.1093/jamia/ocaa068
|View full text |Cite
|
Sign up to set email alerts
|

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services

Abstract: Objective Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. Methods We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 29 publications
0
13
0
1
Order By: Relevance
“…Hence, we have used technologies like Terraform (for cloud infrastructure provisioning, Modi, 2021), Helm (for deploying applications on Kubernetes clusters, Shah and Dubaria, 2019) and Docker (for application code packaging and shipment, Boettiger, 2015). SeQuiLa has been successfully deployed to both popular managed Hadoop services like Google Dataproc (utilized also in Krissaane et al ., 2020 and managed Kubernetes services like Google Kubernetes Engine (GKE) or Azure Kubernetes Service. Figure 3 presents an exemplary setup on GKE using the spark-on-k8s-operator and SeQuiLa application defined as a Kubernetes Custom Resource Definition.…”
Section: Methodsmentioning
confidence: 99%
“…Hence, we have used technologies like Terraform (for cloud infrastructure provisioning, Modi, 2021), Helm (for deploying applications on Kubernetes clusters, Shah and Dubaria, 2019) and Docker (for application code packaging and shipment, Boettiger, 2015). SeQuiLa has been successfully deployed to both popular managed Hadoop services like Google Dataproc (utilized also in Krissaane et al ., 2020 and managed Kubernetes services like Google Kubernetes Engine (GKE) or Azure Kubernetes Service. Figure 3 presents an exemplary setup on GKE using the spark-on-k8s-operator and SeQuiLa application defined as a Kubernetes Custom Resource Definition.…”
Section: Methodsmentioning
confidence: 99%
“…The UK Biobank shows how standard data use agreements that set minimum data management and usage requirements can be developed to build trust and facilitate data exchange between data owners and utilizers. 14 Cloud-based solutions can also facilitate greater access to and usage of data, 15 and platforms and tools such as the National Institute of Allergy and Infectious Diseases TB Portals, GenTB (Translational Genomics platform for TB), and ReSeqTB (Relational Sequencing TB Data Platform) share data and tools that support translational clinical research. 16 18 Furthermore, digitalization should link already existing contributions, such as the Global TB Network (GTN) (active in both Brazil and South Africa), who go on to collect missing data on active TB drug safety monitoring (aDSM), 19 , 20 and to investigate possible interaction between COVID-19 and TB.…”
Section: Moving Aheadmentioning
confidence: 99%
“…Equipos en la Nube • Equipo servidor para el soporte del sistema operativo y el software PBX. Configurado en la nube aprovechando los servicios con que cuenta la UFPS con Amazon Cloud AWS [28]. La principal ventaja del software PBX Asterisk, es su bajo costo.…”
Section: Propuesta De Solución O Mejoraunclassified