With the advent of several accurate and sophisticated statistical algorithms and pipelines for DNA sequence analysis, it is becoming increasingly possible to translate raw sequencing data into biologically meaningful information for further clinical analysis and processing. However, given the large volume of the data involved, even modestly complex algorithms would require a prohibitively long time to complete. Hence it is urgent to explore non-conventional implementation platforms to accelerate genomics research.In this thesis, we present a Field-Programmable Gate Array (FPGA) accelerated implementation of the Pair Hidden Markov Model (Pair HMM) forward algorithm, the performance bottleneck in the HaplotypeCaller, a critical function in the popular Genome Analysis Toolkit (GATK) variant calling tool. We introduce the PE ring structure which, thanks to the finegrained parallelism allowed by the FPGA, can be built into various configurations striking a trade-off between Instruction-Level Parallelism (ILP) and data parallelism. We investigate the resource utilization and performance of different configurations. Our solution can achieve a speed-up of up to 487× compared to the C++ baseline implementation on CPU and 1.56× compared to the previous best hardware implementation.ii To my parents, for their love and support.iii ACKNOWLEDGMENTS
Blockchain technology has evolved from being an immutable ledger of transactions for cryptocurrencies to a programmable interactive environment for building distributed reliable applications. Although, blockchain technology has been used to address various challenges, to our knowledge none of the previous work focused on using blockchain to develop a secure and immutable scientific data provenance management framework that automatically verifies the provenance records. In this work, we leverage blockchain as a platform to facilitate trustworthy data provenance collection, verification and management. The developed system utilizes smart contracts and open provenance model (OPM) to record immutable data trails. We show that our proposed framework can efficiently and securely capture and validate provenance data, and prevent any malicious modification to the captured data as long as majority of the participants are honest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.