A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment

Kaplan, Roman; Yavits, Leonid; Ginosar, Ran; Weiser, Uri

doi:10.1109/mm.2017.3211121

Cited by 68 publications

(73 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…But the largest portion (> 50%) of execution time during long read assembly and alignment is spent by the seed chaining filter [17] that does not invoke FM-Index searches. Compared to the CPU, FindeR improves the throughput of short and long read assembly, and long read alignment by 26.5%, 21…”

Section: Results and Analysis A Fm-index In Genome Analysismentioning

confidence: 99%

“…To highlight the performance of FindR on short read alignment, we compared it against prior hardware accelerators [12], [21], [22], [24], [25] focusing on aligning short reads. These accelerators can boost the performance of either seeding [24], [25] or seed extension [12], [21], [22] for short reads. In contrast, only our FindeR is able to process both short read seeding by FM-Index searches and seed extension by FM-Index-based k-mismatch searches.…”

Section: Fm-index For Short Read Alignment 1) Performancementioning

confidence: 99%

See 1 more Smart Citation

FindeR: Accelerating FM-Index-Based Exact Pattern Matching in Genomic Sequences through ReRAM Technology

Zokaee

Zhang

Jiang

2019

2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)

View full text Add to dashboard Cite

Genomics is the critical key to enabling precision medicine, ensuring global food security and enforcing wildlife conservation. The massive genomic data produced by various genome sequencing technologies presents a significant challenge for genome analysis. Because of errors from sequencing machines and genetic variations, approximate pattern matching (APM) is a must for practical genome analysis. Recent work proposes FPGA, ASIC and even process-in-memory-based accelerators to boost the APM throughput by accelerating dynamic-programmingbased algorithms (e.g., Smith-Waterman). However, existing accelerators lack the efficient hardware acceleration for the exact pattern matching (EPM) that is an even more critical and essential function widely used in almost every step of genome analysis including assembly, alignment, annotation and compression.State-of-the-art genome analysis adopts the FM-Index that augments the space-efficient BWT with additional data structures permitting fast EPM operations. But the FM-Index is notorious for poor spatial locality and massive random memory accesses. In this paper, we propose a ReRAM-based process-in-memory architecture, FindeR, to enhance the FM-Index EPM search throughput in genomic sequences. We build a reliable and energyefficient Hamming distance unit to accelerate the computing kernel of FM-Index search using commodity ReRAM chips without introducing extra CMOS logic. We further architect a full-fledged FM-Index search pipeline and improve its search throughput by lightweight scheduling on the NVDIMM. We also create a system library for programmers to invoke FindeR to perform EPMs in genome analysis. Compared to state-of-the-art accelerators, FindeR improves the FM-Index search throughput by 83% ∼ 30K× and throughput per Watt by 3.5× ∼ 42.5K×.

show abstract

Section: Results and Analysis A Fm-index In Genome Analysismentioning

confidence: 99%

Section: Fm-index For Short Read Alignment 1) Performancementioning

confidence: 99%

FindeR: Accelerating FM-Index-Based Exact Pattern Matching in Genomic Sequences through ReRAM Technology

Zokaee

Zhang

Jiang

2019

2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)

View full text Add to dashboard Cite

show abstract

“…In contrast, in NAND CAM, the Match line forms a wired NAND of all bits in the row and discharges on match, while retaining high voltage on mismatch. This work introduces a novel NAND resistive CAM (ReCAM) design, detailing its components bottom-top, from the NAND bitcell design to an associative processing array, and comparing it to an existing NOR ReCAM design [19].…”

Section: Resistive Nand Cam Arraymentioning

confidence: 99%

“…Previous work [19] presented PRINS, a NOR crossbar array architecture, using non-batch write associative processing to perform only pairwise sequence alignment.…”

Section: Introductionmentioning

confidence: 99%

POSTER: BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data

Kaplan

Yavits

Ginosar

2019

2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Self Cite

View full text Add to dashboard Cite

Genome sequences contain hundreds of millions of DNA base pairs. Finding the degree of similarity between two genomes requires executing a compute-intensive dynamic programming algorithm, such as Smith-Waterman. Traditional von Neumann architectures have limited parallelism and cannot provide an efficient solution for large-scale genomic data. Approximate heuristic methods (e.g. BLAST) are commonly used. However, they are suboptimal and still compute-intensive.In this work, we present BioSEAL, a Biological SEquence ALignment accelerator. BioSEAL is a massively parallel non-von Neumann processing-inmemory architecture for large-scale DNA and protein sequence alignment. BioSEAL is based on resistive content addressable memory, capable of energyefficient and high-performance associative processing.We present an associative processing algorithm for entire database sequence alignment on BioSEAL and compare its performance and power consumption with state-of-art solutions. We show that BioSEAL can achieve up to 57× speedup and 156× better energy efficiency, compared with existing solutions for genome sequence alignment and protein sequence database search.

show abstract

“…4c shows the ReCAM memory map at the end of an iteration and the mapping between ReCAM and the scoring matrix. A complete description of the S-W PRinS implementation appears in [28].…”

Section: System Architecturementioning

confidence: 99%

From Processing-in-Memory to Processing-in-Storage

Kaplan

Yavits

Ginosar

2017

JSFI

View full text Add to dashboard Cite

Near-data in-memory processing research has been gaining momentum in recent years. Typical processing-in-memory architecture places a single or several processing elements next to a volatile memory, enabling processing without transferring data to the host CPU. The increased bandwidth to and from volatile memory leads to performance gain. However processing-in-memory does not alleviate von Neumann bottleneck for big data problems, where datasets are too large to fit in main memory.We present a novel processing-in-storage system based on Resistive Content Addressable Memory (ReCAM). It functions simultaneously as a mass storage and as a massively parallel associative processor. ReCAM processing-in-storage resolves the bandwidth wall by keeping computation inside the storage arrays, without transferring it up the memory hierarchy.We show that ReCAM based processing-in-storage architecture may outperform existing processing-in-memory and accelerator based designs. ReCAM processing-in-storage implementation of Smith-Waterman DNA sequence alignment reaches a speedup of almost five over a GPU cluster. An implementation of in-storage inline data deduplication is presented and shown to achieve orders of magnitude higher throughput than traditional CPU and DRAM based systems.Keywords IntroductionUntil the breakdown of Dennard scaling designers focused on improving performance of a single core by increasing instruction level parallelism. In recent years, as Dennard scaling slowed down but Moore's law endured, the focus has shifted to improving parallelism by increasing the number of cores in multicore processors [16]. However, memory bandwidth does not improve at the same rate, making von Neumann bottleneck one of the main performance limiting factors.Data is typically fetched to CPU's main memory from a non-volatile storage such as hard disks or Flash SSDs. Consequently, storage bandwidth and access time pose a major constraint to performance improvement. The problem worsens in datacenter cloud environment, where datasets are distributed among multiple nodes across the datacenter. In such case, data transfer adds latency and reduces bandwidth even further, lowering the performance upper bound.This challenge has motivated renewed interest in Near-Data Processing (NDP) [7]. The main premise of NDP is shifting computing closer to data. NDP seeks to minimize data movement by computing at the most appropriate location in the memory hierarchy, which can be cache, main memory or persistent storage. With NDP, less data needs to be transferred through levels of hierarchy, thus alleviating the limited bandwidth problem. Placing computing resources at the cache level or in main memory (also known as Processing-in-Memory or PiM) does not address the emerging big data problems, where datasets are too large to fit in main memory.Resistive CAM (ReCAM), a storage device based on emerging resistive materials in the bitcell with a novel non-von Neumann Processing-in-Storage (PRinS) compute paradigm, is proposed in order to mitigate the s...

show abstract

A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment

Cited by 68 publications

References 10 publications

FindeR: Accelerating FM-Index-Based Exact Pattern Matching in Genomic Sequences through ReRAM Technology

FindeR: Accelerating FM-Index-Based Exact Pattern Matching in Genomic Sequences through ReRAM Technology

POSTER: BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data

From Processing-in-Memory to Processing-in-Storage

Contact Info

Product

Resources

About