Pigs (Sus scrofa) exhibit diverse phenotypes in different breeds shaped by the combined effects of various local adaptation and artificial selection. To comprehensively characterize the genetic diversity of pigs, we construct a pig pangenome by comparing genome assemblies of 11 representative pig breeds with the reference genome (Sscrofa11.1). Approximately 72.5 Mb non-redundant sequences were identified as pan-sequences which were absent from the Sscrofa11.1. On average, 41.7 kb of spurious heterozygous SNPs per individual are removed and 12.9 kb novel SNPs per individual are recovered using pan-genome as the reference for SNP calling, thereby providing enhanced resolution for genetic diversity in pigs.Homolog annotation and analysis using RNA-seq and Hi-C data indicate that these pan-sequences contain protein-coding regions and regulatory elements. These pansequences can further improve the interpretation of local 3D structure. The pangenome as well as the accompanied web-based database will serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.
The gene numbers and evolutionary rates of birds were assumed to be much lower than those of mammals, which is in sharp contrast to the huge species number and morphological diversity of birds. It is therefore necessary to construct a complete avian genome and analyze its evolution. We constructed a chicken pan-genome from 20 de novo assembled genomes with high sequencing depth, and identified 1,335 protein-coding genes and 3,011 long noncoding RNAs not found in GRCg6a. The majority of these novel genes were detected across most individuals of the examined transcriptomes but were seldomly measured in each of the DNA sequencing data regardless of Illumina or PacBio technology. Furthermore, different from previous pan-genome models, most of these novel genes were overrepresented on chromosomal sub-telomeric regions and micro-chromosomes, surrounded by extremely high proportions of tandem repeats, which strongly blocks DNA sequencing. These hidden genes were proved to be shared by all chicken genomes, included many housekeeping genes, and enriched in immune pathways. Comparative genomics revealed the novel genes had three-fold elevated substitution rates than known ones, updating the knowledge about evolutionary rates in birds. Our study provides a framework for constructing a better chicken genome, which will contribute towards the understanding of avian evolution and improvement of poultry breeding.
It is broadly expected that next generation sequencing will ultimately generate a complete genome as is the latest goat reference genome (ARS1), which is considered to be one of the most continuous assemblies in livestock. However, the rich diversity of worldwide goat breeds indicates that a genome from one individual would be insufficient to represent the whole genomic contents of goats. By comparing nine de novo assemblies from seven sibling species of domestic goat with ARS1 and using resequencing and transcriptome data from goats for verification, we identified a total of 38.3 Mb sequences that were absent in ARS1. The pan-sequences contain genic fractions with considerable expression. Using the pan-genome (ARS1 together with the pan-sequences) as a reference genome, variation calling efficacy can be appreciably improved. A total of 56,657 spurious SNPs per individual were repressed and 24,414 novel SNPs per individual on average were recovered as a result of better reads mapping quality. The transcriptomic mapping rate was also increased by ∼1.15%. Our study demonstrated that comparing de novo assemblies from closely related species is an efficient and reliable strategy for finding missing sequences from the reference genome and could be applicable to other species. Pan-genome can serve as an improved reference genome in animals for a better exploration of the underlying genomic variations and could increase the probability of finding genotype-phenotype associations assessed by a comprehensive variation database containing much more differences between individuals. We have constructed a goat pan-genome web interface for data visualization ().
17Pigs (Sus scrofa) exhibit diverse phenotypes in different breeds shaped by the 18 combined effects of various local adaptation and artificial selection. To 19 comprehensively characterize the genetic diversity of pigs, we construct a pig pan-20 genome by comparing genome assemblies of 11 representative pig breeds with the 21 reference genome (Sscrofa11.1). Approximately 72.5 Mb non-redundant sequences 22 were identified as pan-sequences which were absent from the Sscrofa11.1. On 23 average, 41.7 kb of spurious heterozygous SNPs per individual are removed and 12.9 24 kb novel SNPs per individual are recovered using pan-genome as the reference for 25 SNP calling, thereby providing enhanced resolution for genetic diversity in pigs. 26Homolog annotation and analysis using RNA-seq and Hi-C data indicate that these 27 pan-sequences contain protein-coding regions and regulatory elements. These pan-28 sequences can further improve the interpretation of local 3D structure. The pan-29 genome as well as the accompanied web-based database will serve as a primary 30 resource for exploration of genetic diversity and promote pig breeding and biomedical 31 research. 32 65level. 66Here we carried out an in-depth comparison between 11 de novo assemblies 67 and the reference genome by analysis of the assembly-versus-assembly alignment. 68The final pan-genome comprises 39,744 (total length: 72.5 Mb) newly added 69 sequences and of which 607 demonstrate coding potential. Furthermore, the three-70 dimensional (3D) spatial structure of pan-genome was depicted by revealing the 71 characteristics of pan-genome in A/B compartment (generally euchromatic and 72 heterochromatic regions) and topologically associating domain (TAD). We also build 73 a pig pan-genome database (PIGPAN, 74 http://animal.nwsuaf.edu.cn/code/index.php/panPig) which can serve as a 75 fundamental resource for unlocking variations within diverse pig breeds.76 5 Results 77Initial characterization of pan-sequences in the pig genome 78 To construct the pig pan-genome, we first aligned 11 assemblies from 11 genetically 79 distinct breeds (five from Europe, and six from China) against Sscrofa11.1 using 80 BLASTN to generate the unaligned sequences ( Fig. 1a and Supplementary Fig. 2). 81 The length of the unaligned sequences in the Chinese pigs was significantly longer 82 than those in the European pigs (P <0.01) since the reference genome is from a 83 European pig (Fig. 1a). As expected, the Wuzhishan assembly had the largest number 84 of sequences because this sample is the only male individual among the 11 assemblies 85 and can provide many male-specific sequences ( Fig. 1a and Supplementary Table 86 2). After removing redundant sequences, we obtained 39,744 sequences with a total 87 length of 72.5 Mb (Fig. 1b), which were absent from Sscrofa11.1 and thus were 88 defined as pan-sequences. The content of the repetitive elements (45.91%) and GC 89 (44.61%) in these sequences were slightly higher than those in Sscrofa11.1 (45.19% 90 and 41.5%, respectively) ( Fig. 1...
Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.