The precise molecular mechanisrms that lead to coronary artery disease (CAD) and myocardial infarction (MI) are not understood, despite a wealth of knowledge on predisposing risk factors and pathomechanisms. CAD and MI are complex genetic diseases; neither the environment alone nor a single gene cause disease, but a mix of environmental and genetic factors lead to atherosclerosis of the coronary arteries and subsequent manifestation of clinical disease. The biological complexity of atherosclerotic disease results from unknown or unpredictable interactions of many genetic and environmental factors which, by themselves, have only been partially identified. According to current knowledge, genetic variations in causative or susceptihility genes form the basis of molecular mechanisms that, together with environmental impact, lead to CAD/MI and determine its clinical course. Linkage analysis, which follows 'disease' alleles in families, or genetic association in a population of unrelated individuals are tools used in the search for chromosomal loci and candidate genes that are involved in these complex diseases. Progress in sequencing and mapping of the human genorne and efforts to identify all of the expected one million single nucleotide polymorphisms (SNPs) expected to be present in mankind will allow new approaches such as genome-wide association studies. The contribution of the current state of knowledge on genetic variation in man towards the dissection of CAD/MI as complex traits is sobering. Raised expectations with regard to the power of molecular genetic studies as compared to the traditional pathophysiological experimental approaches, lack of precise clinical phenotyping, lack of functional characterisation of gene variants, and the vast number of yet undetected genes may provide some explanation. Except for certain polymorphisms in lipid genes (i.e., apolipoprotein E [apo E]) or rare genetic variations (i.e., LDL receptor), which have a causal effect on both the intermediate (LDL-cholesterol level in plasma) and the clinical phenotypes (CAD/MI), the role of most gene polymorphisms is controversial or unknown. Despite the enormous progress in sequencing the human genome and in molecular genetic and bioinformatic techniques during the past decade, the progress in mapping and identifying genes responsible for complex traits such as CAD/MI has been modest and presents a formidable challenge to medical research in the 21st century.
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, for example, of dimensions 10 3 -10 5 in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and, hence, accommodates the highly sparse nature of genetic analyses while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.