Background & Aims
Previous genome‐wide association studies (GWAS) have identified multiple susceptible variants associated with persistent hepatitis B virus (HBV) infection. However, most of these variants are located in the noncoding regions, which make it difficult to determine the effective genes underlying these associations. We performed a two‐stage study, in the first stage we integrated RNA sequencing data of liver tissues and high‐density genotyping data from the Genotype‐Tissue Expression (GTEx) project with our previous GWAS data to conduct a transcriptome‐wide association study (TWAS) on HBV infection. Firstly, the cis‐heritable genes were screened by a genetic relatedness matrix of genome‐wide complex trait analysis (GCTA) from GTEx data. Then, the genetic expression of 2587 cis‐heritable genes was predicted by restricted maximum likelihood (REML) of genome‐wide efficient mixed‐model association (GEMMA) in our GWAS data with 951 HBV carrier cases and 937 HBV cleared controls. Next, we investigated the associations between predictive expression levels and persistent HBV infection risk. Gene set enrichment analysis (GSEA) was applied to infer the function of the identified genes. To identify the causal single nucleotide polymorphisms (SNPs) of HBV infection risk, we conducted the expression quantitative trait loci (eQTL)‐based stepwise logistic regression analysis in the regions around 1 Mb of these genes and validated the association between 994 health controls and 994 HBV‐persistent infection cases by genotyping experiment. In the second stage, 1538 HBV‐related hepatocellular carcinoma (HCC) cases and 1465 persistent HBV infection controls were collected to determine the effect of these variants on HBV‐related HCC as well, which were examined by the additive model in logistic regression analysis. We identified seven genes associated with HBV infection. In the classic human leukocyte antigen (HLA) region, three novel genes BAK1, HLA‐DOB and C4A (Z range from −3.95 to −3.64, P range from 7.84 × 10−5 to 2.00 × 10−4), as well as two genes (HLA‐DPA1 and HLA‐DPB1) were reported by previous GWAS. In the non‐HLA region, immune related at newly identified loci, PARP9 (Z = 3.69, P = 2.20 × 10−4) at 3q21.1. At 22q11.21, we identified TMEM191A (Z = 3.55, P = 3.80 × 10−4) as a target gene in addition to the reported non‐cis‐heritable gene UBE2L3. After further stepwise logistic regression analysis and validation, we identified eight variants independently associated with persistent HBV infection. Among those variants, the additive model showed that two SNPs associated with HBV‐related HCC risk (rs9272714 and rs9394194, OR range from 1.20 to 1.25, P range from 1.19 × 10−4 to 3.97 × 10−4). By integrating transcriptome data, our study not only identified new susceptibility loci of persistent HBV infection but also determined the potential target genes at reported loci, which provided insight into the genetic aetiology of persistent HBV infection and related HCC.