A novel processing-in-storage (PRinS) architecture based on Resistive CAM (ReCAM) is described and proposed for Smith-Waterman (S-W) sequence alignment. The ReCAM PRinS massively-parallel compare operation finds matching base-pairs in a fixed number of cycles, regardless of sequence length. The ReCAM PRinS S-W algorithm is simulated and compared to FPGA, Xeon Phi and GPU-based implementations, show ing at least 4.7× higher throughput and at least 15× low er pow er dissipation.
OSP/claudin-11-associated protein (OAP-1/Tspan-3), originally isolated by yeast two-hybrid screening using OSP/claudin-11 (oligodendrocyte-specific protein) as bait, is a member of the tetraspanin superfamily and OAP-1/Tspan-3, OSP/claudin-11, and beta1 integrin form a protein complex that seems to be involved in oligodendrocyte proliferation and migration. This study investigated the temporal and regional expression, glycosylation status, and tissue distribution of OAP-1/Tspan-3. OAP-1/Tspan-3 mRNA was expressed as a single transcript throughout brain development, with high levels of expression in the germinal zones. OAP-1/Tspan-3 protein contains N-terminal glycosylation sites in extracellular loop 2 and deglycosylation studies indicated a decrease in apparent molecular weight of OAP-1/Tspan-3, consistent with removal of N-glycans. Similar to OSP/claudin-11, OAP-1/Tspan-3 is expressed in all stages of oligodendrocyte development and in the myelin sheath. Unlike OSP/claudin-11, however, it is expressed in all cell types tested in the central nervous system (CNS), including neurons and astrocytes. The association of OAP-1/Tspan-3 with OSP/claudin-11 and beta1 integrin, its subcellular distribution as a cell surface, membrane-spanning glycoprotein, and its widespread distribution supports its potential role in cell migration, proliferation, and interactions between cells and extracellular matrix.
Genome sequences contain hundreds of millions of DNA base pairs. Finding the degree of similarity between two genomes requires executing a compute-intensive dynamic programming algorithm, such as Smith-Waterman. Traditional von Neumann architectures have limited parallelism and cannot provide an efficient solution for large-scale genomic data. Approximate heuristic methods (e.g. BLAST) are commonly used. However, they are suboptimal and still compute-intensive.In this work, we present BioSEAL, a Biological SEquence ALignment accelerator. BioSEAL is a massively parallel non-von Neumann processing-inmemory architecture for large-scale DNA and protein sequence alignment. BioSEAL is based on resistive content addressable memory, capable of energyefficient and high-performance associative processing.We present an associative processing algorithm for entire database sequence alignment on BioSEAL and compare its performance and power consumption with state-of-art solutions. We show that BioSEAL can achieve up to 57× speedup and 156× better energy efficiency, compared with existing solutions for genome sequence alignment and protein sequence database search.
DNA read mapping is a computationally expensive bioinformatics task, required for genome assembly and consensus polishing. It requires to find the best-fitting location for each DNA read on a long reference sequence. A novel resistive approximate similarity search accelerator, RASSA, exploits charge distribution and parallel in-memory processing to reflect a mismatch count between DNA sequences. RASSA implementation of DNA long read pre-alignment outperforms the state-of-art solution, minimap2, by 16-77× with comparable accuracy and provides two orders of magnitude higher throughput than GateKeeper, a short-read pre-alignment hardware architecture implemented in FPGA.Constructing human DNA sequence in real time is paramount to development of precision medicine 1 and on-site pathogen detection of disease outbreaks 2 . Single-molecule, real-time sequencing from Pacific Biosciences 3 (PacBio) and Oxford Nanopore Technologies 4 (ONT) are new technologies that can produce long reads within minutes, potentially enabling real time genomic analysis. However, long read DNA sequencing poses new challenges. First, long reads contain many thousands of base pairs (bps). Second, long reads tend to exhibit about 15-20% insertion, deletion (indel) and substitution errors 3,4 .To construct a complete host sequence, in case a reference sequence exists (from a previously sequenced organism), long reads are mapped to high-similarity locations of the reference sequence. Determining the edit distance between every mapped read and the reference sequence requires a computationally intensive local alignment procedure (e.g., Smith-Waterman 4 ). Its computational time complexity is typically ( ) for two sequences with lengths and . Reference sequences vary from several millions to billions of bps. It is therefore computationally prohibitive to perform optimal alignment of every long read with the entire reference sequence.Read mappers (e.g., minimap 6 , minimap2 7 ) find regions of high similarity (mappings) between reads or Ran GinosarTechnion, Israel Institute of Technology IEEE MICRO between a read and a reference sequence, followed by an alignment step to determine the exact edit distance and verify that the mapping is correct. In case that a pre-alignment algorithm identifies a specific region in the reference suitable for mapping, the alignment can be performed only on that region, reducing alignment's duration and resource requirements 8 . Therefore, read mapping can be viewed as a twostep process: (1) pre-alignment filtering and (2) accurate alignment verification. The pre-alignment step reduces the problem size for aligners by narrowing the regions to ones with potentially high-scoring alignment.Existing pre-alignment hardware solutions 9,10 target short reads (up to several hundred bps) which contain a small number of indel and substitution errors (less than 5%) and have a different error profile than that of PacBio or ONT long reads 3,4 . High edit distance threshold is required for mapping long but error-prone reads. However, c...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.