2020
DOI: 10.1101/2020.02.27.967539
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CandiHap: a toolkit for haplotype analysis for sequence of samples and fast identification of candidate causal gene(s) in genome-wide association study

Abstract: Genome-wide association study (GWAS) is widely used to identify genes involved in plants, animals and human complex traits. Generally, the identified SNP is not necessarily the causal variant, but it is rather in linkage disequilibrium (LD). One key challenge for GWAS results interpretation is to rapidly identify causal genes and provide profound evidence on how they affect the trait. Researches want to identify candidate causal variants from the most significant SNPs of GWAS in any species and on their local … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…Given the false-positive and false-negative issues associated with GWAS (Atwell et al, 2010;Chan et al, 2010aChan et al, , 2010b, we combined biological and bioinformatic approaches to mine candidate genes. The filtering procedure was set as follows: (1) haplotype analysis for all genes in the LD regions of a significant SNP using CandiHap (Li et al, 2020) (Supplemental Figure 4); (2) search for proteins that are biochemically related to the associated metabolic traits encoded at the candidate loci; (3) cluster analysis of candidate genes and homologous genes with known function; (4) cross-reference with statistical significance results of causal genes from gene-based GWAS using CandiHap; and (5) prediction of functional/molecular phenotypes from GWAS results by performing transcriptome-wide association studies (TWASs) (Supplemental Table 12). After the filtering steps above, we identified 511 candidate genes associated with 690 lead SNPs that might be responsible for the variation in metabolic traits (Table 1 and Supplemental Table 13).…”
Section: Genetic Basis Of Natural Variations Of the Metabolitesmentioning
confidence: 99%
See 1 more Smart Citation
“…Given the false-positive and false-negative issues associated with GWAS (Atwell et al, 2010;Chan et al, 2010aChan et al, , 2010b, we combined biological and bioinformatic approaches to mine candidate genes. The filtering procedure was set as follows: (1) haplotype analysis for all genes in the LD regions of a significant SNP using CandiHap (Li et al, 2020) (Supplemental Figure 4); (2) search for proteins that are biochemically related to the associated metabolic traits encoded at the candidate loci; (3) cluster analysis of candidate genes and homologous genes with known function; (4) cross-reference with statistical significance results of causal genes from gene-based GWAS using CandiHap; and (5) prediction of functional/molecular phenotypes from GWAS results by performing transcriptome-wide association studies (TWASs) (Supplemental Table 12). After the filtering steps above, we identified 511 candidate genes associated with 690 lead SNPs that might be responsible for the variation in metabolic traits (Table 1 and Supplemental Table 13).…”
Section: Genetic Basis Of Natural Variations Of the Metabolitesmentioning
confidence: 99%
“…To the best of our knowledge, haplotype analysis has usually been conducted manually, which is laborious, time-consuming, and prone to errors and omissions. To solve this problem, we developed a user-friendly software, CandiHap (https://github.com/xukaili/CandiHap) (Li et al, 2020). With CandiHap, users can complete the following analysis within a minute, which usually costs hours or even days manually: (1) local GWAS for a gene; (2) haplotype analysis for a gene (CandiHap); (3) haplotype analysis for all genes in the LD regions of a significant SNP position one by one (GWAS_LD2haplotypes); (4) haplotype analysis of Sanger sequencing population variation data run on Unix-like systems (sanger_CandiHap.sh).…”
Section: Candihap: An R Platform For Haplotype Analysis Of Variation ...mentioning
confidence: 99%
“…A total of 52 SNPs on the gene sequence of TaELD1-1A among the 352 wheat accessions were retrieved from the 1000 wheat exomes project of He et al (2019) ( Supplementary Table 5 ), among which 14 SNPs were filtered with heterozygosity <0.03 and used for haplotype analysis by the “CandiHap” package ( Li et al, 2020 ) of R 4.0.1 ( R Core Team, 2013 ) (see text footnote 4) ( Supplementary Table 6 ). Haplotype analysis showed that four main haplotypes (Hap1–4, containing accessions >10) of TaELD1-1A were detected among 352 wheat accessions ( Figure 4 and Supplementary Table 7 ).…”
Section: Resultsmentioning
confidence: 99%
“…To assess the allelic variation of the TaELD1-1A gene across various wheat cultivars, the haplotype analysis of TaELD1-1A was performed using the SNP data (heterozygosity <0.03) on TaELD1-1A gene sequences among the 352 wheat accessions retrieved from the 1000 wheat exomes project of He et al (2019) (see text footnote 3) using the “CandiHap” package ( Li et al, 2020 ) of R 4.0.1 (R Core Team, 2013) (see text footnote 4), and the differences of the phenotypes for Gp corresponding to different haplotypes were tested. Moreover, the homologous gene sequences of TaELD1-1A in pan-genomes including 10+ hexaploid wheat ( Walkowiak et al, 2020 ), emmer wheat (Zavitan) ( Avni et al, 2017 ), and durum wheat (Svevo) ( Maccaferri et al, 2019 ) genomes were downloaded from Ensembl Plants 8 according to the best-match gene IDs to TraesCSU02G143200 through BLAST.…”
Section: Methodsmentioning
confidence: 99%
“…The genes located within the high LD blocks were selected for checking the gene annotation (https:// rapdb.dna.affrc.go.jp/). Then, a haplotype-based association analysis for the genes within the LD block was conducted by candihap software (Li et al, 2020).…”
Section: Linkage Disequilibrium Block Analysis and Candidate Gene Ass...mentioning
confidence: 99%