Background
Reference genomes are essential in the analysis of genomic data. As the cost of sequencing decreases, multiple reference genomes are being produced within species to alleviate problems such as low mapping accuracy and reference allele bias in variant calling that can be associated with the alignment of divergent samples to a single reference individual. The latest reference sequence adopted by the scientific community for the analysis of cattle data is ARS_UCD1.2, built from the DNA of a Hereford cow (Bos taurus taurus—B. taurus). A complementary genome assembly, UOA_Brahman_1, was recently built to represent the other cattle subspecies (Bos taurus indicus—B. indicus) from a Brahman cow haplotype to further support analysis of B. indicus data. In this study, we aligned the sequence data of 15 B. taurus and B. indicus breeds to each of these references.
Results
The alignment of B. taurus individuals against UOA_Brahman_1 detected up to five million more single-nucleotide variants (SNVs) compared to that against ARS_UCD1.2. Similarly, the alignment of B. indicus individuals against ARS_UCD1.2 resulted in one and a half million more SNVs than that against UOA_Brahman_1. The number of SNVs with nearly fixed alternative alleles also increased in the alignments with cross-subspecies. Interestingly, the alignment of B. taurus cattle against UOA_Brahman_1 revealed regions with a smaller than expected number of counts of SNVs with nearly fixed alternative alleles. Since B. taurus introgression represents on average 10% of the genome of Brahman cattle, we suggest that these regions comprise taurine DNA as opposed to indicine DNA in the UOA_Brahman_1 reference genome. Principal component and admixture analyses using genotypes inferred from this region support these taurine-introgressed loci. Overall, the flagged taurine segments represent 13.7% of the UOA_Brahman_1 assembly. The genes located within these segments were previously reported to be under positive selection in Brahman cattle, and include functional candidate genes implicated in feed efficiency, development and immunity.
Conclusions
We report a list of taurine segments that are in the UOA_Brahman_1 assembly, which will be useful for the interpretation of interesting genomic features (e.g., signatures of selection, runs of homozygosity, increased mutation rate, etc.) that could appear in future re-sequencing analysis of indicine cattle.