SummaryCowpea (Vigna unguiculata [L.] Walp.) is a major crop for worldwide food and nutritional security, especially in sub‐Saharan Africa, that is resilient to hot and drought‐prone environments. An assembly of the single‐haplotype inbred genome of cowpea IT97K‐499‐35 was developed by exploiting the synergies between single‐molecule real‐time sequencing, optical and genetic mapping, and an assembly reconciliation algorithm. A total of 519 Mb is included in the assembled sequences. Nearly half of the assembled sequence is composed of repetitive elements, which are enriched within recombination‐poor pericentromeric regions. A comparative analysis of these elements suggests that genome size differences between Vigna species are mainly attributable to changes in the amount of Gypsy retrotransposons. Conversely, genes are more abundant in more distal, high‐recombination regions of the chromosomes; there appears to be more duplication of genes within the NBS‐LRR and the SAUR‐like auxin superfamilies compared with other warm‐season legumes that have been sequenced. A surprising outcome is the identification of an inversion of 4.2 Mb among landraces and cultivars, which includes a gene that has been associated in other plants with interactions with the parasitic weed Striga gesnerioides. The genome sequence facilitated the identification of a putative syntelog for multiple organ gigantism in legumes. A revised numbering system has been adopted for cowpea chromosomes based on synteny with common bean (Phaseolus vulgaris). An estimate of nuclear genome size of 640.6 Mbp based on cytometry is presented.
Cowpea (Vigna unguiculata [L.] Walp.) is a major crop for worldwide food and nutritional security, especially in sub-Saharan Africa, that is resilient to hot and drought-prone environments. An assembly of the singlehaplotype inbred genome of cowpea IT97K-499-35 was developed by exploiting the synergies between single-molecule real-time sequencing, optical and genetic mapping, and an assembly reconciliation algorithm. A total of 519 Mb is included in the assembled sequences. Nearly half of the assembled sequence is composed of repetitive elements, which are enriched within recombination-poor pericentromeric regions. A comparative analysis of these elements suggests that genome size differences between Vigna species are mainly attributable to changes in the amount of Gypsy retrotransposons. Conversely, genes are more abundant in more distal, high-recombination regions of the chromosomes; there appears to be more duplication of genes within the NBS-LRR and the SAUR-like auxin superfamilies compared with other warm-season legumes that have been sequenced. A surprising outcome is the identification of an inversion of 4.2 Mb among landraces and cultivars, which includes a gene that has been associated in other plants with interactions with the parasitic weed Striga gesnerioides. The genome sequence facilitated the identification of a putative syntelog for multiple organ gigantism in legumes. A revised numbering system has been adopted for cowpea chromosomes based on synteny with common bean (Phaseolus vulgaris). An estimate of nuclear genome size of 640.6 Mbp based on cytometry is presented.
Jute (Corchorus sp.) is one of the most important sources of natural fibre, covering ∼80% of global bast fibre production1. Only Corchorus olitorius and Corchorus capsularis are commercially cultivated, though there are more than 100 Corchorus species2 in the Malvaceae family. Here we describe high-quality draft genomes of these two species and their comparisons at the functional genomics level to support tailor-designed breeding. The assemblies cover 91.6% and 82.2% of the estimated genome sizes for C. olitorius and C. capsularis, respectively. In total, 37,031 C. olitorius and 30,096 C. capsularis genes are identified, and most of the genes are validated by cDNA and RNA-seq data. Analyses of clustered gene families and gene collinearity show that jute underwent shared whole-genome duplication ∼18.66 million years (Myr) ago prior to speciation. RNA expression analysis from isolated fibre cells reveals the key regulatory and structural genes involved in fibre formation. This work expands our understanding of the molecular basis of fibre formation laying the foundation for the genetic improvement of jute.
Background Essential genes are those genes that are critical for the survival of an organism. The prediction of essential genes in bacteria can provide targets for the design of novel antibiotic compounds or antimicrobial strategies. Results We propose a deep neural network for predicting essential genes in microbes. Our architecture called DeeplyEssential makes minimal assumptions about the input data (i.e., it only uses gene primary sequence and the corresponding protein sequence) to carry out the prediction thus maximizing its practical application compared to existing predictors that require structural or topological features which might not be readily available. We also expose and study a hidden performance bias that effected previous classifiers. Extensive results show that DeeplyEssential outperform existing classifiers that either employ down-sampling to balance the training set or use clustering to exclude multiple copies of orthologous genes. Conclusion Deep neural network architectures can efficiently predict whether a microbial gene is essential (or not) using only its sequence information.
Essential genes are genes that critical for the survival of an organism. The prediction of essential genes in bacteria can provide targets for the design of novel antibiotic compounds or antimicrobial strategies. Here we propose a deep neural network (DNN) for predicting essential genes in microbes. Our DNN-based architecture called DeeplyEssential makes minimal assumptions about the input data (i.e., it only uses gene primary sequence and the corresponding protein sequence) to carry out the prediction, thus maximizing its practical application compared to existing predictors that require structural or topological features which might not be readily available. Our extensive experimental results show that DeeplyEssential outperforms existing classifiers that either employ down-sampling to balance the training set or use clustering to exclude multiple copies of orthologous genes. We also expose and study a hidden performance bias that affected previous classifiers.The code of DeeplyEssential is freely available at https://github.com/ucrbioinfo/DeeplyEssential 1 Introduction 1 Essential genes are those genes that are critical for the survival and reproduction of an 2 organism [17]. Since the disruption of essential genes induces the death of an organism, 3 the identification of essential genes can provide targets for new antimicrobial/antibiotic 4 drugs [7, 13]. The set of essential genes is also critical for the creation of artificial 5 self-sustainable living cells with a minimal genome [16]. Essential genes have also been a 6 cornerstone in understanding the origin and evolution of organisms [18]. 7 The identification of essential genes via wet-lab experiments is labor intensive, 8 expensive and time consuming. Such experimental procedures include single gene 9 knock-out [3, 12], RNA interference and transposon mutagenesis [8, 32]. Moreover, these 10 experimental approaches can produce contradicting results [23]. With the recent 11 advances in high-throughput sequencing technology, computational methods for 12 predicting essential genes has become a reality. Some of the early prediction methods 13 used comparative approaches by homology mapping, see, e.g., [27, 43]. With the 14 introduction of large gene database such as DEG, CEG and OGEE [4, 25, 40], researchers 15 designed more complex prediction models using a wider set of features. These features 16 can be broadly categorized into (i) sequence features, i.e., codon frequency, GC content, 17 gene length [29, 35, 42], (ii) topological features, i.e., degree centrality, cluster 18 coefficient [1, 6, 24, 31], and (iii) functional features, i.e., homology, gene expression 19cellular localization, functional domain and molecular properties [5,9,23,30,39].Sequence based features can be directly obtained from the primary DNA sequence of 21 a gene and its corresponding protein sequence. Functional features such as network 22 topology requires knowledge of protein-protein interaction network, e.g., STRING and 23 HumanNET [15,37]. Gene expression and functional dom...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.