A common approach to estimate the strength and direction of selection acting on protein codingsequences is to calculate the dN/dS ratio. The method to calculate dN/dS has been widely used by many researchers and many critical reviews have been made on its application after the proposition by Nei and Gojobori in 1986. However, the method is still evolving considering the non-uniform substitution rates and pretermination codons. In our study of SNPs in 586 genes across 156 Escherichia coli strains, synonymous polymorphism in two-fold degenerate codons were higher in comparison to that in four-fold degenerate codons, which could be attributed to the difference between transition (Ti) and transversion (Tv) substitution rates where the average rate of a transition is four times more than that of a transversion in general. We considered both the Ti/Tv ratio, and nonsense mutation in pretermination codons, to improve estimates of synonymous (S) and non-synonymous (NS) sites. The accuracy of estimating dN/dS has been improved by considering the Ti/Tv ratio and nonsense substitutions in pretermination codons. We showed that applying the modified approach based on Ti/Tv ratio and pretermination codons results in higher values of dN/dS in 29 common genes of equal reading-frames between Escherichia coli and Salmonella enterica. This study emphasizes the robustness of amino acid composition with varying codon degeneracy, as well as the pretermination codons when calculating dN/dS values.
The temporary exposure of single-stranded regions in the genome during the process of replication and transcription makes the region vulnerable to cytosine deamination resulting higher rate of C T transitions. Intra-operon intergenic regions undergo transcription along with adjacent co-transcribed genes in an operon, whereas inter-operon intergenic regions only undergo replication. Hence these two types of intergenic regions (IGRs) can be compared to find out the contribution of replication-associated mutations (RAM) and transcriptionassociated mutations (TrAM) towards bringing variation in genomes. In our work, we performed a polymorphism spectra comparison between intra-operon IGRs and inter-operon IGRs in genomes of two well-known closely related bacteria such as Escherichia coli and Salmonella enterica. In general, the size of intra-operon IGRs was smaller than that of interoperon IGRs in these bacteria. Interestingly, the polymorphism frequency at intra-operon IGRs was 2.5-fold lesser than that in the inter-operon IGRs in E. coli genome. Similarly, the polymorphism frequency at intra-operon IGRs was 2.8-fold lesser than that in the inter-operon IGRs in S. enterica genome. Therefore, the intra-operon IGRs were often observed to be more conserved. In the case of inter-operon IGRs, the T C transition frequency was a minimum of two times more than T A transversion frequency whereas in the case of intra-operon IGRs, T C transition frequency was similar to that of T A transversion frequency. The polymorphism was purine biased and keto biased more in intra-operon IGRs than the interoperon IGRs. In E. coli, the Ti/Tv ratio was observed as 1.639 and 1.338 in inter-operon and in intra-operon IGRs, respectively. In S. enterica, the Ti/Tv ratio was observed as 2.134 and 2.780 in inter-operon and in intra-operon IGRs, respectively. The observation in this study indicates that transcribed IGRs might not always have higher polymorphism frequency than the untranscribed IGRs. The lower polymorphism frequency at intra-operon IGRs might be attributed to different events such as the transcription-coupled DNA repair, sequences facilitating translation initiation and avoidance of rho-dependent transcription termination.
The previous findings suggest that replication and transcription are two major reasons behind the different substitution patterns of mutations in genomic DNA. In the current work, we have compared the adjacent co-transcribed gene pairs regarding synonymous polymorphism in five different operons in Escherichia coli. It is interesting that the co-transcribed genes were different from each other regarding the polymorphism spectra. The transition to transversion ratio between gene pairs were different due to their compositional differences regarding two-fold degenerate codon and four-fold degenerate codons. Further, the polymorphism spectra difference between the gene pairs was more prominent in four-fold and six-fold degenerate codons than in the two-fold degenerate codons. In case of rpoB and rpoC, the major difference was found at UCC, GUA, CCG, GCU, GGC and CGC codons. Similarly, in case of the other four pairs of co-transcribed genes, the difference was more prominent in the higher degenerate codons than the two-fold degenerate codons. It may be that the restriction of two-fold degenerate codons to transition substitutions only regarding synonymous polymorphism is making these codons different from the higher degeneracy codons in this study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.