We introduce a new method for linkage disequilibrium mapping: haplotype pattern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a class of useful haplotype patterns in genetic case-control data and use the algorithm for finding disease-associated haplotypes. The haplotypes are ordered by their strength of association with the phenotype, and all haplotypes exceeding a given threshold level are used for prediction of disease susceptibility-gene location. The method is model-free, in the sense that it does not require (and is unable to utilize) any assumptions about the inheritance model of the disease. The statistical model is nonparametric. The haplotypes are allowed to contain gaps, which improves the method's robustness to mutations and to missing and erroneous data. Experimental studies with simulated microsatellite and SNP data show that the method has good localization power in data sets with large degrees of phenocopies and with lots of missing and erroneous data. The power of HPM is roughly identical for marker maps at a density of 3 single-nucleotide polymorphisms/cM or 1 microsatellite/cM. The capacity to handle high proportions of phenocopies makes the method promising for complex disease mapping. An example of correct disease susceptibility-gene localization with HPM is given with real marker data from families from the United Kingdom affected by type 1 diabetes. The method is extendable to include environmental covariates or phenotype measurements or to find several genes simultaneously.
Abstract. Public biological databases contain vast amounts of rich data that can also be used to create and evaluate new biological hypothesis. We propose a method for link discovery in biological databases, i.e., for prediction and evaluation of implicit or previously unknown connections between biological entities and concepts. In our framework, information extracted from available databases is represented as a graph, where vertices correspond to entities and concepts, and edges represent known, annotated relationships between vertices. A link, an (implicit and possibly unknown) relation between two entities is manifested as a path or a subgraph connecting the corresponding vertices. We propose measures for link goodness that are based on three factors: edge reliability, relevance, and rarity. We handle these factors with a proper probabilistic interpretation. We give practical methods for finding and evaluating links in large graphs and report experimental results with Alzheimer genes and protein interactions.
We describe TreeDT, a novel association-based gene mapping method. Given a set of disease-associated haplotypes and a set of control haplotypes, TreeDT predicts likely locations of a disease susceptibility gene. TreeDT extracts, essentially in the form of haplotype trees, information about historical recombinations in the population: A haplotype tree constructed at a given chromosomal location is an estimate of the genealogy of the haplotypes. TreeDT constructs these trees for all locations on the given haplotypes and performs a novel disequilibrium test on each tree: Is there a small set of subtrees with relatively high proportions of disease-associated chromosomes, suggesting shared genetic history for those and a likely disease gene location? We give a detailed description of TreeDT and the tree disequilibrium tests, we analyze the algorithm formally, and we evaluate its performance experimentally on both simulated and real data sets. Experimental results demonstrate that TreeDT has high accuracy on difficult mapping tasks and comparisons to other methods (EATDT, HPM, TDT) show that TreeDT is very competitive.
We have analyzed a dense set of single-nucleotide polymorphisms (SNPs) and microsatellites spanning the T-helper cytokine gene cluster (interleukins 3, 4, 5, 9, and 13, interferon regulatory factor-1, colony-stimulating factor-2, and T-cell transcription factor-7) on 5q31 and the gene encoding the interleukin-4 receptor (IL4R) on 16p12 among Finnish families with asthma. As shown by haplotype pattern mining analysis, the number of disease-associated haplotype patterns differed from that expected for the 129Q allele polymorphism in IL13 for high serum total immunoglobulin (Ig) E levels, but not for asthma. The same SNP also yielded the best haplotype associations. For IL4R, asthma-associated haplotype patterns, most spanning the S411L polymorphism, showed suggestive association. However, these haplotypes consisted of the major alleles for the intracellular part of the receptor and were very common among both patients and controls. The minor alleles 503P and 576R have been reported to be associated with decreased serum IgE levels and changes in the biological activity of the protein, especially when inherited together. In the Finnish population, these two polymorphisms segregated in strong linkage disequilibrium. Our data support previous findings regarding IL4R, indicating that 503P and 576R may act as minor protecting alleles for IgE-mediated disorders.Key Words: asthma, atopy, haplotype pattern mining, polymorphism, association families). The study showed suggestive linkage for asthma (LOD = 2.6) but no evidence for atopy [8]. Although it is not known whether human genes in the 5q31 cytokine cluster are genetic regulators for atopic disorders, experimental models of asthma have shown the importance of both IL4 and IL13 signaling. Administration of either exogenous IL4 or IL13 induced the asthma phenotype in mice [9]. However, neither of these cytokines was able to induce the asthma phenotype in mice deficient in the IL4 receptor (IL4R) [9]. The IL4 receptor is a heterodimer consisting of a common ␥-chain that is shared by several other interleukin receptors (IL2, IL7, IL9, and IL13) and a ligand-specific ␣-chain encoded by IL4R. An IL4R␣-IL13R␣ heterodimer is also able to transduce IL13 signaling [10] and is an important signaling pathway in the asthma model. All this indicates the importance of IL4R␣ in IL4/IL13-mediated signaling.
SummaryWe describe a method for querying vertex- and edge-labeled graphs using context-free grammars to specify the class of interesting paths. We introduce a novel problem, finding the connection subgraph induced by the set of matching paths between given two vertices or two sets of vertices. Such a subgraph provides a concise summary of the relationship between the vertices. We also present novel algorithms for parsing subgraphs directly without enumerating all the individual paths. We evaluate experimentally the presented parsing algorithms on a set of real graphs derived from publicly available biomedical databases and on randomly generated graphs. The results indicate that parsing the connection subgraph directly is much more effective than parsing individual paths separately. Furthermore, we show that using a bidirectional parsing algorithm, in most cases, allows for searching twice as long paths as using a unidirectional search strategy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.