The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
The DNA sequencing technologies in use today produce either highly accurate short reads or lessaccurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
In higher plants, cellulose is synthesized at the plasma membrane by the cellulose synthase (CESA) complex. The catalytic core of the complex is believed to be composed of three types of CESA subunits. Indirect evidence suggests that the complex associated with primary wall cellulose deposition consists of CESA1, -3, and -6 in Arabidopsis thaliana. However, phenotypes associated with mutations in two of these genes, CESA1 and -6, suggest unequal contribution by the different CESAs to overall enzymatic activity of the complex. We present evidence that the primary complex requires three unique types of components, CESA1-, CESA3-, and CESA6-related, for activity. Removal of any of these components results in gametophytic lethality due to pollen defects, demonstrating that primary-wall cellulose synthesis is necessary for pollen development. We also show that the CESA6-related CESAs are partially functionally redundant.gametophytic lethal ͉ isoforms ͉ pollen ͉ cellulose synthesis ͉ mutant
The development of sustainable, low-carbon, liquid fuels from cellulosic biomass will require advances in many areas of science and engineering. This review describes the major topics of enquiry concerning cellulosic biofuels with an emphasis on those areas of research and development that include research problems of interest to plant biologists.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.