2021
DOI: 10.1371/journal.pone.0255260
|View full text |Cite
|
Sign up to set email alerts
|

Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment

Abstract: Computational pan-genomics utilizes information from multiple individual genomes in large-scale comparative analysis. Genetic variation between case-controls, ethnic groups, or species can be discovered thoroughly using pan-genomes of such subpopulations. Whole-genome sequencing (WGS) data volumes are growing rapidly, making genomic data compression and indexing methods very important. Despite current space-efficient repetitive sequence compression and indexing methods, the deployed compression methods are oft… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 39 publications
(58 reference statements)
0
3
0
Order By: Relevance
“…The table shows that apart from some tools that reports tests only on a multi-core workstation ( [16] , [17] , [18] , [19] ), Spark has been widely used to implement tools aimed at parallelizing the computation on a distributed computing environment. Most of these tools have been specifically devised for, or tested on, a cloud environment ( [20] , [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] [29] , [30] , [31] , [32] [33] , [34] , [35] , [36] , [37] ). Being the increasing availability of IaaS (Infrastructure as a Service) cloud computing services, it is desirable that the released tools are commonly designed to be supported also by such infrastructures.…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
See 2 more Smart Citations
“…The table shows that apart from some tools that reports tests only on a multi-core workstation ( [16] , [17] , [18] , [19] ), Spark has been widely used to implement tools aimed at parallelizing the computation on a distributed computing environment. Most of these tools have been specifically devised for, or tested on, a cloud environment ( [20] , [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] [29] , [30] , [31] , [32] [33] , [34] , [35] , [36] , [37] ). Being the increasing availability of IaaS (Infrastructure as a Service) cloud computing services, it is desirable that the released tools are commonly designed to be supported also by such infrastructures.…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
“…The Table 1 also highlights that Spark is also used with other frameworks. In particular, it is often used in conjunction with Hadoop to take advantange of its file system (i.e., HDFS) ( [16] , [22] , [23] , [26] , [27] , [30] , [31] , [34] , [35] , [38] , [39] , [40] , [41] [42] ) and of its cluster manager (i.e., YARN) ( [30] , [31] , [43] ).…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
See 1 more Smart Citation