In the analysis of data it is often assumed that observations Yl> Y2, ... , Yn are independently normally distributed with constant variance and with expectations specified by a model linear in a set of parameters a. In this paper we make the less restrictive assumption that such a normal, homoscedastic, linear model is appropriate after some suitable transformation has been applied to the y's. Inferences about the transformation and about the parameters of the linear model are made by computing the likelihood function and the relevant posterior distribution. The contributions of normality, homoscedasticity and additivity to the transformation are separated. The relation of the present methods to earlier procedures for finding transformations is discussed. The methods are illustrated with examples.
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Nonalcoholic fatty liver disease (NAFLD) is a burgeoning health problem of unknown etiology that varies in prevalence among ethnic groups. To identify genetic variants contributing to differences in hepatic fat content, we performed a genome-wide association scan of nonsynonymous sequence variations (n=9,229) in a multiethnic population. An allele in PNPLA3 (rs738409; I148M) was strongly associated with increased hepatic fat levels (P=5.9×10−10) and with hepatic inflammation (P=3.7×10−4). The allele was most common in Hispanics, the group most susceptible to NAFLD; hepatic fat content was > 2-fold higher in PNPLA3-148M homozygotes than in noncarriers. Resequencing revealed another allele associated with lower hepatic fat content in African-Americans, the group at lowest risk of NAFLD. Thus, variation in PNPLA3 contributes to ethnic and inter-individual differences in hepatic fat content and susceptibility to NAFLD.
We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.