We consider the problem of estimating the parameters of a convolutional encoder from noisy data observations, i.e. when encoded bits are received with errors. Reverse engineering of a channel encoder has applications in cryptanalysis when attacking communication systems and also in DNA sequence analysis, when looking for possible error correcting codes in genomes. We present a new iterative, probabilistic algorithm based on the Expectation Maximization (EM) algorithm. We use the concept of log-likelihood ratio (LLR) algebra which will greatly simplify the derivation and interpretation of our final algorithm. We show results indicating the necessary data length and allowed channel error rate for reliable estimation.
SUMMARYThis paper gives a brief overview of several applications from the emerging interdisciplinary field of genomic coding theory that aims at applying concepts and techniques from the field of coding theory to problems from the field of molecular biology. This is motivated by the high precision and robustness found in genomic processes in addition to the increase in the availability of genomic data for a wide range of species. The considered applications include source coding for DNA classification, channel coding for modelling gene expression with emphasis on the process of translation, existence of error correcting codes in the DNA and channel coding structure in the genetic code. Example results are presented that demonstrate the relevance of the proposed approaches and open questions are formulated to suggest future research work.
Background: Comparative genomics aims to detect signals of evolutionary conservation as an indicator of functional constraint. Surprisingly, results of the ENCODE project revealed that about half of the experimentally verified functional elements found in non-coding DNA were classified as unconstrained by computational predictions. Following this observation, it has been hypothesized that this may be partly explained by biased estimates on neutral evolutionary rates used by existing sequence conservation metrics. All methods we are aware of rely on a comparison with the neutral rate and conservation is estimated by measuring the deviation of a particular genomic region from this rate. Consequently, it is a reasonable assumption that inaccurate neutral rate estimates may lead to biased conservation and constraint estimates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.