One of the most difficult challenges in lossless data compression is finding the right model for the Deoxyribonucleic Acid (DNA) compression. DNA sequences include four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T) where the information in DNA is stored as a code made up of these four chemical bases and these sequences show these are not random, if they are totally random then store them in two bits, this is the most efficient and logical way. This paper proposed an algorithm called A2 for DNA data compression. The proposed algorithm consists of four stages to build a substitutional model. The first stage used a modified version of run-length coding, in second and third stages mapping model for formatting data to be suitable for the final stage fed into Burrows-Wheeler Transform to use permutation technique that group related symbols as possible to improve dictionary coding using Lempel-Ziv (LZ77) and output file stored as (.a2) extension. The A2 algorithm implemented and tested on data from GenBank and shows acceptable file size and processing time ratio.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.