Many genomic analyses start by aligning sequencing reads to a linear reference genome. However, linear reference genomes are imperfect, lacking millions of bases of unknown relevance and are unable to reflect the genetic diversity of populations. This makes reference-guided methods susceptible to reference-allele bias. To overcome such limitations, we build a pangenome from six reference-quality assemblies from taurine and indicine cattle as well as yak. The pangenome contains an additional 70,329,827 bases compared to the Bos taurus reference genome. Our multiassembly approach reveals 30 and 10.1 million bases private to yak and indicine cattle, respectively, and between 3.3 and 4.4 million bases unique to each taurine assembly. Utilizing transcriptomes from 56 cattle, we show that these nonreference sequences encode transcripts that hitherto remained undetected from the B. taurus reference genome. We uncover genes, primarily encoding proteins contributing to immune response and pathogen-mediated immunomodulation, differentially expressed between Mycobacterium bovis–infected and noninfected cattle that are also undetectable in the B. taurus reference genome. Using whole-genome sequencing data of cattle from five breeds, we show that reads which were previously misaligned against the Bos taurus reference genome now align accurately to the pangenome sequences. This enables us to discover 83,250 polymorphic sites that segregate within and between breeds of cattle and capture genetic differentiation across breeds. Our work makes a so-far unused source of variation amenable to genetic investigations and provides methods and a framework for establishing and exploiting a more diverse reference genome.
Cattle are ideally suited to investigate the genetics of male reproduction, because semen quality and fertility are recorded for all ejaculates of artificial insemination bulls. We analysed 26,090 ejaculates of 794 Brown Swiss bulls to assess ejaculate volume, sperm concentration, sperm motility, sperm head and tail anomalies and insemination success. The heritability of the six semen traits was between 0 and 0.26. Genome-wide association testing on 607,511 SNPs revealed a QTL on bovine chromosome 6 that was associated with sperm motility (P = 2.5 x 10 −27), head (P = 2.0 x 10 −44) and tail anomalies (P = 7.2 x 10 −49) and insemination success (P = 9.9 x 10 −13). The QTL harbors a recessive allele that compromises semen quality and male fertility. We replicated the effect of the QTL on fertility (P = 7.1 x 10 −32) in an independent cohort of 2481 Brown Swiss bulls. The analysis of wholegenome sequencing data revealed that a synonymous variant (BTA6:58373887C>T, rs474302732) in WDR19 encoding WD repeat-containing protein 19 was in linkage disequilibrium with the fertility-associated haplotype. WD repeat-containing protein 19 is a constituent of the intraflagellar transport complex that is essential for the physiological function of motile cilia and flagella. Bioinformatic and transcription analyses revealed that the BTA6:58373887 T-allele activates a cryptic exonic splice site that eliminates three evolutionarily conserved amino acids from WDR19. Western blot analysis demonstrated that the BTA6:58373887 T-allele decreases protein expression. We make the remarkable observation that, in spite of negative effects on semen quality and bull fertility, the BTA6:58373887 T-allele has a frequency of 24% in the Brown Swiss population. Our findings are the first to uncover a variant that is associated with quantitative variation in semen quality and male fertility in cattle.
Advantages of pangenomes over linear reference assemblies for genome research have recently been established. However, potential effects of sequence platform and assembly approach, or of combining assemblies created by different approaches, on pangenome construction have not been investigated. Here we generate haplotype-resolved assemblies from the offspring of three bovine trios representing increasing levels of heterozygosity that each demonstrate a substantial improvement in contiguity, completeness, and accuracy over the current Bos taurus reference genome. Diploid coverage as low as 20x for HiFi or 60x for ONT is sufficient to produce two haplotype-resolved assemblies meeting standards set by the Vertebrate Genomes Project. Structural variant-based pangenomes created from the haplotype-resolved assemblies demonstrate significant consensus regardless of sequence platform, assembler algorithm, or coverage. Inspecting pangenome topologies identifies 90 thousand structural variants including 931 overlapping with coding sequences; this approach reveals variants affecting QRICH2, PRDM9, HSPA1A, TAS2R46, and GC that have potential to affect phenotype.
Casein (CN) phosphorylation is an important posttranslational modification and is one of the key factors responsible for constructing and stabilizing casein micelles. Variation in phosphorylation degree of αS-CN is of great interest because it is suggested to affect milk technological properties. This study aimed to investigate the variation in phosphorylation degree of αS-CN among milk of individual cows and to explore relationships among different phosphorylation isoforms of αS-CN. For this purpose, we analyzed morning milk samples from 529 French Montbéliarde cows using liquid chromatography coupled with electrospray ionization mass spectrometry. We detected 3 new phosphorylation isoforms: αS2-CN-9P, αS2-CN-14P, and αS2-CN-15P in bovine milk, in addition to the known isoforms αS1-CN-8P, αS1-CN-9P, αS2-CN-10P, αS2-CN-11P, αS2-CN-12P, and αS2-CN-13P. The relative concentrations of each αS-CN phosphorylation isoform varied considerably among individual cows. Furthermore, the phenotypic correlations and hierarchical clustering suggest at least 2 regulatory systems for phosphorylation of αS-CN: one responsible for isoforms with lower levels of phosphorylation (αS1-CN-8P, αS2-CN-10P, and αS2-CN-11P), and another responsible for isoforms with higher levels of phosphorylation (αS1-CN-9P, αS2-CN-12P, αS2-CN-13P, and αS2-CN-14P). Identifying all phosphorylation sites of αS2-CN and investigating the genetic background of different αS2-CN phosphorylation isoforms may provide further insight into the phosphorylation mechanism of caseins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.