Accurate prediction of open reading frames (ORFs) is important for studying and using genome sequences. Ribosomes move along mRNA strands with a step of three nucleotides and datasets carrying this information can be used to predict ORFs. The ribosome-protected footprints (RPFs) feature a significant 3-nt periodicity on mRNAs and are powerful in predicting translating ORFs, including small ORFs (sORFs), but the application of RPFs is limited because they are too short to be accurately mapped in complex genomes. In this study, we found a significant 3-nt periodicity in the datasets of populational genomic variants in coding sequences, in which the nucleotide diversity increases every three nucleotides. We suggest that this feature can be used to predict ORFs and develop the Python package ‘OrfPP’, which recovers ~83% of the annotated ORFs in the tested genomes on average, independent of the population sizes and the complexity of the genomes. The novel ORFs, including sORFs, identified from single-nucleotide polymorphisms are supported by protein mass spectrometry evidence comparable to that of the annotated ORFs. The application of OrfPP to tetraploid cotton and hexaploid wheat genomes successfully identified 76.17% and 87.43% of the annotated ORFs in the genomes, respectively, as well as 4704 sORFs, including 1182 upstream and 2110 downstream ORFs in cotton and 5025 sORFs, including 232 upstream and 234 downstream ORFs in wheat. Overall, we propose an alternative and supplementary approach for ORF prediction that can extend the studies of sORFs to more complex genomes.
Haplotype identification, characterization and visualization are important for large-scale analysis and use in population genomics. Many tools have been developed to visualize haplotypes, but it is challenging to display both the pattern of haplotypes and the genotypes for each single SNP in the context of a large amount of genomic data. Here, we describe the tool HAPPE, which uses the agglomerative hierarchical clustering algorithm to characterize and visualize the genotypes and haplotypes in a phylogenetic context. The tool displays the plots by coloring the cells and/or their borders in Excel tables for any given gene and genomic region of interest. HAPPE facilitates informative displays wherein data in plots are easy to read and access. It allows parallel display of several lines of values, such as phylogenetic trees, P values of GWAS, the entry of genes or SNPs, and the sequencing depth at each position. These features are informative for the detection of insertion/deletions or copy number variations. Overall, HAPPE provides editable plots consisting of cells in Excel tables, which are user-friendly to non-programmers. This pipeline is coded in Python and is available at https://github.com/fengcong3/HAPPE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.