Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at an enormous rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in vertebrate model organisms, typically mice, have been essential for understanding the activities of many orthologues of these disease-associated genes. Although gene-targeting approaches1-3 and phenotype analysis have led to a detailed understanding of nearly 6,000 protein-coding genes3,4, this number falls significantly short of all >22,000 mouse protein-coding genes5. Similarly, in zebrafish genetics, one-by-one gene studies using positional cloning6, insertional mutagenesis7-9, antisense morpholino oligonucleotides10, targeted re-sequencing11-13 and zinc finger and TAL endonucleases14-17 have made significant contributions to our understanding of the biological activity of vertebrate genes, but the number of genes studied again falls well short of the >26,000 zebrafish protein-coding genes18. Importantly, for both mice and zebrafish, none of these strategies is particularly suited to the rapid generation of knockouts in thousands of genes and the assessment of their biological activity. Enabled by a well-annotated zebrafish reference genome sequence18,19, high-throughput sequencing and efficient chemical mutagenesis, we describe an active project that aims to identify and phenotype disruptive mutations in every zebrafish protein-coding gene. Thus far we have identified potentially disruptive mutations in more than 38% of all known protein coding genes. We have developed a multi-allelic phenotyping scheme to efficiently assess the effects of each allele during embryogenesis and have analysed the phenotypic consequences of over 1000 alleles. All mutant alleles and data are available to the community and our phenotyping scheme is adaptable to phenotypic analysis beyond embryogenesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.