Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2022
DOI: 10.1145/3503222.3507702
|View full text |Cite
|
Sign up to set email alerts
|

GenStore: a high-performance in-storage processing system for genome sequence analysis

Abstract: of read mapping processes of reads with different properties and degrees of genetic variation, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flashbased solid-state drive (SSD). Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern NAND flash-based SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05× (1.52-3.32×) for read sets with high … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 37 publications
(13 citation statements)
references
References 300 publications
(849 reference statements)
0
13
0
Order By: Relevance
“…We already provide the SIMD implementation to calculate the hash values BLEND. We encourage implementing our mechanism for the applications that use seeds to find sequence similarity using processing-inmemory and near-data processing [102][103][104][105][106][107][108][109][110][111][112][113][114], GPUs [115][116][117], and FPGAs and ASICs [118][119][120][121][122][123] to exploit the massive amount of embarrassingly parallel bitwise operations in BLEND to find fuzzy seed matches. Third, we believe it is possible to apply the hashing technique we use in BLEND for many seeding techniques with a proper design.…”
Section: Discussionmentioning
confidence: 99%
“…We already provide the SIMD implementation to calculate the hash values BLEND. We encourage implementing our mechanism for the applications that use seeds to find sequence similarity using processing-inmemory and near-data processing [102][103][104][105][106][107][108][109][110][111][112][113][114], GPUs [115][116][117], and FPGAs and ASICs [118][119][120][121][122][123] to exploit the massive amount of embarrassingly parallel bitwise operations in BLEND to find fuzzy seed matches. Third, we believe it is possible to apply the hashing technique we use in BLEND for many seeding techniques with a proper design.…”
Section: Discussionmentioning
confidence: 99%
“…Sequence-to-Sequence Accelerators. Even though there are several hardware accelerators designed to alleviate bottlenecks in several steps of traditional sequence-to-sequence (S2S) mapping (e.g., pre-alignment filtering [72,73,75,76,94,[140][141][142][143][144][145][146][147][148], sequenceto-sequence alignment [68-70, 129-132, 149-151]), none of these designs can be directly employed for the sequence-to-graph (S2G) mapping problem. This is because S2S mapping is a special case of S2G mapping, where all nodes have only one edge (Figure 3a).…”
Section: Accelerating Sequence-to-graph Mappingmentioning
confidence: 99%
“…Existing hardware accelerators for genome sequence analysis focus on accelerating only the traditional sequence-to-sequence mapping pipeline, and cannot support genome graphs as their inputs. For example, GenStore [142], ERT [144], GenCache [143], NEST [145], MEDAL [146], SaVI [147], SMEM++ [148], Shifted Hamming Distance [94], GateKeeper [72], MAGNET [140], Shouji [141], and SneakySnake [73,76] accelerate the seeding and/or filtering steps of sequence-to-sequence mapping.…”
Section: Related Workmentioning
confidence: 99%
“…Performing sequence alignment is still computationally expensive and it is an open research problem [106][107][108][109][110]113 . Due to the low sequencing error rates of Illumina sequencing machines, it is observed that a large fraction of short reads typically maps exactly or with a few mismatches to the reference genome [114][115][116][117] . For example, on average 80% of human short reads map exactly to the human reference genome 114 .…”
Section: Handling Exactly-matching Short Readsmentioning
confidence: 99%
“…For example, on average 80% of human short reads map exactly to the human reference genome 114 . We employ a quick filter 116 that detects exactly-matching reads using SIMD instructions and outputs their alignment information directly to the SAM file without performing sequence alignment calculations for such reads.…”
Section: Handling Exactly-matching Short Readsmentioning
confidence: 99%