The excitement over findings from Genome-Wide Association Studies (GWASs) has been tempered by the difficulty in finding the location of the true causal disease susceptibility loci (DSLs), rather than markers that are correlated with the causal variants. In addition, many recent GWASs have studied multiple phenotypes -often highly correlated -making it difficult to understand which associations are causal and which are seemingly causal, induced by phenotypic correlations. In order to identify DSLs, which are required to understand the genetic etiology of the observed associations, statistical methodology has been proposed that distinguishes between a direct effect of a genetic locus on the primary phenotype and an indirect effect induced by the association with the intermediate phenotype that is also correlated with the primary phenotype. However, so far, the application of this important methodology has been challenging, as no user-friendly software implementation exists. The lack of software implementation of this sophisticated methodology has prevented its large-scale use in the genetic community. We have now implemented this statistical approach in a user-friendly and robust R package that has been thoroughly tested. The R package 'CGene' is available for download at http://cran.r-project.org/. The R code is also available at http://people.hsph.harvard.edu/ plipman. European Journal of Human Genetics (2011Genetics ( ) 19, 1292Genetics ( -1294 doi:10.1038/ejhg.2011; published online 6 July 2011Keywords: causal modeling; statistical genetics; software
INTRODUCTIONThe excitement over positive findings from recently published Genome-Wide Association Studies (GWASs) has been tempered by the difficulty in finding the location of the true causal disease susceptibility loci (DSLs), rather than markers that are correlated with the DSL. 1 Complicating the picture is that many recent GWASs have studied multiple phenotypes, which are often highly correlated. [2][3][4] This has made it very difficult to understand which associations that were discovered by GWASs are causal and which are seemingly causal, induced by phenotypic correlations. The ability to distinguish between causal genetic associations and seemingly causal associations induced by the intermediate phenotypes can provide important clues into the underlying genetic architecture of the disease. In addition, there is much interest in identifying endo-phenotypes or expression profiles that may lie in the 'genetic path' between the marker locus and the phenotype of interest to better understand how the genetic mechanisms influence the complex trait. The recent interest in understanding causal genetic pathways has led to the development of new statistical techniques that look to distinguish between direct and indirect causal genetic mechanisms. For quantitative and binary traits, VanSteenlandt et al 5 proposed a regression adjustment procedure that is applied to the quantitative phenotype of interest, adjusting for the potential presence of an association between...