2020
DOI: 10.1371/journal.pone.0239741
|View full text |Cite
|
Sign up to set email alerts
|

Big Data in metagenomics: Apache Spark vs MPI

Abstract: The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine. This has sparked significant interest in using modern Big Data technologies to process this large amount of information in distributed memory clusters of commodity hardware. Several approaches based on solutions such as Apache Hadoop or Apache Spark, have been proposed. These solutions allow developers to focus on the problem while the need to deal with low … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 33 publications
0
8
0
Order By: Relevance
“…Thus, access to high performance computing (HPC) clusters or cloud-based environments would facilitates the processing of metagenomics data. 88 There is a continuous introduction of new technologies and data types expected to be added to the current omics data types which indicates the growing importance of HPC and cloud-based services. 82…”
Section: Integrated Multi-omics Analyses Of Microbial Communitiesmentioning
confidence: 99%
“…Thus, access to high performance computing (HPC) clusters or cloud-based environments would facilitates the processing of metagenomics data. 88 There is a continuous introduction of new technologies and data types expected to be added to the current omics data types which indicates the growing importance of HPC and cloud-based services. 82…”
Section: Integrated Multi-omics Analyses Of Microbial Communitiesmentioning
confidence: 99%
“…However, the efficiency of MPI-based parallel applications degrades when dealing with large data sets. Moreover, programming with MPI requires programmers to explicitly deal with the individual nodes' status and communication patterns [33]. Finally, failures in MPI are dealt with by using stop-and-restart checkpointing solutions [34].…”
Section: Related Workmentioning
confidence: 99%
“…As a general-purpose framework, Spark has been widely used for many scientific applications and algorithms. However, there are examples from different areas such as linear algebra [44], genomics [45] or even data science [46] where Spark does not obtain the expected performance.…”
Section: Spark and Hpc Applicationsmentioning
confidence: 99%