Common genetic polymorphisms may explain a portion of the heritable risk for common diseases. Within candidate genes, the number of common polymorphisms is finite, but direct assay of all existing common polymorphism is inefficient, because genotypes at many of these sites are strongly correlated. Thus, it is not necessary to assay all common variants if the patterns of allelic association between common variants can be described. We have developed an algorithm to select the maximally informative set of common single-nucleotide polymorphisms (tagSNPs) to assay in candidate-gene association studies, such that all known common polymorphisms either are directly assayed or exceed a threshold level of association with a tagSNP. The algorithm is based on the r(2) linkage disequilibrium (LD) statistic, because r(2) is directly related to statistical power to detect disease associations with unassayed sites. We show that, at a relatively stringent r(2) threshold (r2>0.8), the LD-selected tagSNPs resolve >80% of all haplotypes across a set of 100 candidate genes, regardless of recombination, and tag specific haplotypes and clades of related haplotypes in nonrecombinant regions. Thus, if the patterns of common variation are described for a candidate gene, analysis of the tagSNP set can comprehensively interrogate for main effects from common functional variation. We demonstrate that, although common variation tends to be shared between populations, tagSNPs should be selected separately for populations with different ancestries.
Importance Germline mutations in BRCA1 and BRCA2 are relatively common in women with ovarian, fallopian tube, and peritoneal carcinoma (OC) causing a greatly increased lifetime risk of these cancers, but the frequency and relevance of inherited mutations in other genes is less well characterized. Objective To determine the frequency and importance of germline mutations in cancer-associated genes in OC. Design Subjects were ascertained from two phase III clinical trials in newly diagnosed advanced stage OC (GOG 218 and GOG 262), and a university-based gynecologic oncology tissue bank. Germline DNA was sequenced from women with OC using the targeted capture and multiplex sequencing assay BROCA. Setting Referral centers participating in NRG Oncology studies, and a University-based gynecologic oncology practice (UW). Participants The study population was 1915 women with OC with available germline DNA, unselected for age or family history, enrolled at the time of OC diagnosis (GOG 218, N=788; GOG 262, N=557; UW, N=570). Main Outcomes and Measures Mutation frequencies in OC were compared to the NHLBI GO Exome Sequencing Project (ESP) and the Exome Aggregation Consortium (ExAC). Clinical characteristics and survival were assessed by mutation status. Results Of 1915 subjects, 280 (15%) had mutations in BRCA1 (182), or BRCA2 (98) and 8 (0.4%) had mutations in DNA mismatch repair (MMR) genes. Mutations in BRIP1 (26), RAD51C (11), RAD51D (11), PALB2 (12) and BARD1 (4), were significantly more common in OC patients than in the ESP or ExAC, and in total were present in 3.3% of patients. Race, histologic subtype, and disease site were not predictive of mutation frequency. Mutation status affected survival, in particular for BRCA2 mutation carriers with HR 0.60 (95% CI 0.45 – 0.79, p<0.001) for progression-free survival, and HR 0.39 (95% CI 0.25 – 0.60, p<0.001) for overall survival in the GOG patients. Conclusions and Relevance In total, 347/1915 (18%) OC patients carried pathogenic germline mutations in genes associated with OC risk. PALB2 and BARD1 are suspected OC genes and together with established OC genes (BRCA1, BRCA2, BRIP1, RAD51C, RAD51D, MSH2, MLH1, PMS2, and MSH6) bring the total number of genes suspected to cause hereditary OC to 11.
Early protein synthesis is thought to have involved a reduced amino acid alphabet. What is the minimum number of amino acids that would have been needed to encode complex protein folds similar to those found in nature today? Here we show that a small beta-sheet protein, the SH3 domain, can be largely encoded by a five letter amino acid alphabet but not by a three letter alphabet. Furthermore, despite the dramatic changes in sequence, the folding rates of the reduced alphabet proteins are very close to that of the naturally occurring SH3 domain. This finding suggests that despite the vast size of the search space, the rapid folding of biological sequences to their native states is not the result of extensive evolutionary optimization. Instead, the results support the idea that the interactions which stabilize the native state induce a funnel shape to the free energy landscape sufficient to guide the folding polypeptide chain to the proper structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.