Cot-based cloning and sequencing (CBCS) is a powerful tool for isolating and characterizing the various repetitive components of any genome, combining the established principles of DNA reassociation kinetics with high-throughput sequencing. CBCS was used to generate sequence libraries representing the high, middle, and low-copy fractions of the chicken genome. Sequencing high-copy DNA of chicken to about 2.7× coverage of its estimated sequence complexity led to the initial identification of several new repeat families, which were then used for a survey of the newly released first draft of the complete chicken genome. The analysis provided insight into the diversity and biology of known repeat structures such as CR1 and CNM, for which only limited sequence data had previously been available. Cot sequence data also resulted in the identification of four novel repeats (Birddawg, Hitchcock, Kronos, and Soprano), two new subfamilies of CR1 repeats, and many elements absent from the chicken genome assembly. Multiple autonomous elements were found for a novel Mariner-like transposon, Galluhop, in addition to nonautonomous deletion derivatives. Phylogenetic analysis of the high-copy repeats CR1, Galluhop, and Birddawg provided insight into two distinct genome dispersion strategies. This study also exemplifies the power of the CBCS method to create representative databases for the repetitive fractions of genomes for which only limited sequence data is available.
Using a replication-deficient retroviral vector based on the avian leukosis virus (ALV), we inserted into the chicken genome a transgene encoding a secreted protein, beta-lactamase, under the control of the ubiquitous cytomegalovirus (CMV) promoter. Biologically active beta-lactamase was secreted into the serum and egg white of four generations of transgenic chickens. The expression levels were similar in successive generations, and expression levels in the magnum of the oviduct were constant over at least 16 months in transgenic hens, indicating that the transgene was stable and not subject to silencing. These results support the potential of the hen as a bioreactor for the production of commercially valuable, biologically active proteins in egg white.
Several statistical methods were tested for accuracy in predicting observed frequencies of di- through hexanucleotides in 74,444 bp of E. coli DNA. A Markov chain was most accurate overall, whereas other methods, including a random model based on mononucleotide frequencies, were very inaccurate. When ranked highest to lowest abundance, the observed frequencies of oligonucleotides up to six bases in length in E. coli DNA were highly asymmetric. All ordered abundance plots had a wide linear range containing the majority of the oligomers which deviated sharply at the high and low ends of the curves. In general, values predicted by a Markov chain closely followed the overall shape of the ordered abundance curves. A simple equation was derived by which the frequency of any nucleotide longer than four bases in the E. coli genome (or any genome) can be relatively accurately estimated from the nested set of component tri- and tetranucleotides by serial application of a 3rd order Markov chain. The equation yielded a mean ratio of 1.03 +/- 0.94 for the observed-to-expected frequencies of the 4,096 hexanucleotides. Hence, the method is a relatively accurate but not perfect predictor of the length in nucleotides between hexanucleotide sites. Higher accuracy can be achieved using a 4th order Markov chain and larger data sets. The high asymmetry in oligonucleotide abundance means that in the E. coli genome of 4.2 X 10(6) bp many relatively short sequences of 7-9 bp are very rare or absent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.