Mary Aksa scite author profile

Mary Aksa

1Publication

1Citation Statement Received

33Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

BitmapAligner: Bit-Parallelism String Matching with MapReduce and Hadoop

Aksa¹,

Rashid²,

Nisar³

et al. 2021

View full text Add to dashboard Cite

Advancements in next-generation sequencer (NGS) platforms have improved NGS sequence data production and reduced the cost involved, which has resulted in the production of a large amount of genome data. The downstream analysis of multiple associated sequences has become a bottleneck for the growing genomic data due to storage and space utilization issues in the domain of bioinformatics. The traditional string-matching algorithms are efficient for small sized data sequences and cannot process large amounts of data for downstream analysis. This study proposes a novel bit-parallelism algorithm called BitmapAligner to overcome the issues faced due to a large number of sequences and to improve the speed and quality of multiple sequence alignment (MSA). The input files (sequences) tested over BitmapAligner can be easily managed and organized using the Hadoop distributed file system. The proposed aligner converts the test file (the whole genome sequence) into binaries of an equal length of the sequence, line by line, before the sequence alignment processing. The Hadoop distributed file system splits the larger files into blocks, based on a defined block size, which is 128 MB by default. BitmapAligner can accurately process the sequence alignment using the bitmask approach on large-scale sequences after sorting the data. The experimental results indicate that BitmapAligner operates in real time, with a large number of sequences. Moreover, BitmapAligner achieves the exact start and end positions of the pattern sequence to test the MSA application in the whole genome query sequence. The MSA's accuracy is verified by the bitmask indexing property of the bit-parallelism extended shifts (BXS) algorithm. The dynamic and exact approach of the BXS algorithm is implemented through the MapReduce function of Apache Hadoop. Conversely, the traditional seeds-and-extend approach faces the risk of errors while identifying the pattern sequences' positions. Moreover, the proposed model resolves the largescale data challenges that are covered through MapReduce in the Hadoop framework. Hive, Yarn, HBase, Cassandra, and many other pertinent flavors

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mary Aksa

BitmapAligner: Bit-Parallelism String Matching with MapReduce and Hadoop

Contact Info

Product

Resources

About