The rough draft of the human genome map has been used to identify most of the functional genes in the human genome, as well as to identify nucleotide variations, known as "single-nucleotide polymorphisms" (SNPs), in these genes. By use of advanced biotechnologies, researchers are beginning to genotype thousands of SNPs from biological samples. Among the many possible applications, one of them is the study of SNP associations with complex human diseases, such as cancers or coronary heart diseases, by using a case-control study design. Through the gathering of environmental risk factors and other lifestyle factors, such a study can be effectively used to investigate interactions between genes and environmental factors in their associations with disease phenotype. Earlier, we developed a method to statistically construct individuals' haplotypes and to estimate the distribution of haplotypes of multiple SNPs in a defined population, by use of estimating-equation techniques. Extending this idea, we describe here an analytic method for assessing the association between the constructed haplotypes along with environmental factors and the disease phenotype. This method is also robust to the model assumptions and is scalable to a large number of SNPs. Asymptotic properties of estimations in the method are proved theoretically and are tested for finite sample sizes by use of simulations. To demonstrate the use of the method, we applied it to assess the possible association between apolipoprotein CIII (six coding SNPs) and restenosis by using a case-control data set. Our analysis revealed two haplotypes that may reduce the risk of restenosis.
Estimating haplotype frequencies becomes increasingly important in the mapping of complex disease genes, as millions of single nucleotide polymorphisms (SNPs) are being identified and genotyped. When genotypes at multiple SNP loci are gathered from unrelated individuals, haplotype frequencies can be accurately estimated using expectation-maximization (EM) algorithms (Excoffier and Slatkin, 1995; Hawley and Kidd, 1995; Long et al., 1995), with standard errors estimated using bootstraps. However, because the number of possible haplotypes increases exponentially with the number of SNPs, handling data with a large number of SNPs poses a computational challenge for the EM methods and for other haplotype inference methods. To solve this problem, Niu and colleagues, in their Bayesian haplotype inference paper (Niu et al., 2002), introduced a computational algorithm called progressive ligation (PL). But their Bayesian method has a limitation on the number of subjects (no more than 100 subjects in the current implementation of the method). In this paper, we propose a new method in which we use the same likelihood formulation as in Excoffier and Slatkin's EM algorithm and apply the estimating equation idea and the PL computational algorithm with some modifications. Our proposed method can handle data sets with large number of SNPs as well as large numbers of subjects. Simultaneously, our method estimates standard errors efficiently, using the sandwich-estimate from the estimating equation, rather than the bootstrap method. Additionally, our method admits missing data and produces valid estimates of parameters and their standard errors under the assumption that the missing genotypes are missing at random in the sense defined by Rubin (1976).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.