Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within-and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence-absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.Asian cultivated rice is grown worldwide and comprises the staple food for half of the global population. It is envisaged that by the year 2035 1 feeding this growing population will necessitate that an additional 112 million metric tons of rice be produced on a smaller area of land, using less water and under more fluctuating climatic conditions, which will require that future rice cultivars be higher yielding and resilient to multiple abiotic and biotic stresses. The foundation of the continued improvement of rice cultivars is the rich genetic diversity within domesticated populations and wild relatives [2][3][4] . For over 2,000 years, two major types of O. sativa-O. sativa Xian group (here referred to as Xian/Indica (XI) and also known as , Hsien or Indica) and O. sativa Geng Group (here referred to as Geng/Japonica (GJ) and also known as , Keng or Japonica)-have historically been recognized [5][6][7] . Varied degrees of post-reproductive barriers exist between XI and GJ rice accessions 8 ; this differentiation between XI and GJ rice types and the presence of different varietal groups are well-documented at isozyme and DNA levels 6,9 . Two other distinct groups have also been recognized using molecular markers 10 ; one of these encompasses the Aus, Boro and Rayada ecotypes from Bangladesh and India (which we term the circum-Aus group (cA)) and the other comprises the famous Basmati and Sadri aromatic varieties (which we term the circum-Basmati group (cB)).Approximately 780,000 rice accessions are available in gene banks worldwide 11 . To enable the more efficient use of these accessions in future rice improvement, the Chinese Academy of Agricultural Sciences, BGI-Shenzhen and International Rice Research Institute sequenced over 3,000 rice genomes (3K-RG) as part of the 3,000 Rice Genomes Project 12. Here we present analyses of genetic variation in the 3K-RG that focus on important aspects of O. sativa diversity, single nucleotide polymorphisms (SNPs) and structural variation (deletions, duplications, inversions and translocations). We also construct a species pangenome consisting of 'core...
We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots.
We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org.
Service Email Alerting click here. top right corner of the article or Receive free email alerts when new articles cite this article-sign up in the box at the http://genome.cshlp.org/subscriptions
the 3,000 accessions can be subdivided into nine subpopulations, where most accessions from close subgroups could be associated to geographic origin 12. One critical piece of information missing from these analyses is the fact that single nucleotide polymorphisms (SNPs) and structural variations (SVs) present in subpopulation specific genomic regions have yet to be detected because the 3K-RG data set was only aligned to a single reference genome. Therefore, the next logical step, to capture and understand genetic variation pan-subpopulation-wide is to map the 3K-RG dataset to high-quality reference genomes that represent each of the subpopulations of cultivated Asian rice. At present, only a handful high-quality rice genomes for cultivated rice are publicly available 5,6,13,14 , thus, there is an immediate need for such a comprehensive resource to be created, which is the subject of this Data Descriptor. Here we present a reanalysis of the population structure analysis discussed above 12 and show that the 3K-RG dataset can be further subdivided into a total of 15 subpopulations. We then present the generation of 12 new and near-gap-free high-quality PacBio long-read reference genomes from representative accessions of the 12 subpopulations of cultivated Asian rice for which no high-quality reference genomes exist. All 12 genomes were assembled with more than 100x genome coverage PacBio long-read sequence data and then validated with Bionano optical maps 15. The number of contigs covering each of the twelve assemblies, excluding unplaced contigs, ranged from 15 (GOBOL SAIL (BALAM)::IRGC 26624-2) to 104 (IR 64). The contig N50 value for the 12-genome dataset ranged from 7.35 Mb to 31.91 Mb. When combined with 4 previously published genomes (i.e. Minghui 63 (MH 63), Zhenshan 97 (ZS 97) 13,14 , N 22 5 and the IRGSP RefSeq. 6), this 16-genome dataset can be used to represent the K = 15 population/admixture structure of cultivated Asian rice. Methods ethics statement. This work was approved by the University of Arizona (UA),
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.