The reanalysis of existing GWAS data represents a powerful and cost-effective opportunity to gain insights into the genetics of complex diseases. By reanalyzing publicly available type 2 diabetes (T2D) genome-wide association studies (GWAS) data for 70,127 subjects, we identify seven novel associated regions, five driven by common variants (LYPLAL1, NEUROG3, CAMKK2, ABO, and GIP genes), one by a low-frequency (EHMT2), and one driven by a rare variant in chromosome Xq23, rs146662075, associated with a twofold increased risk for T2D in males. rs146662075 is located within an active enhancer associated with the expression of Angiotensin II Receptor type 2 gene (AGTR2), a modulator of insulin sensitivity, and exhibits allelic specific activity in muscle cells. Beyond providing insights into the genetics and pathophysiology of T2D, these results also underscore the value of reanalyzing publicly available data using novel genetic resources and analytical approaches.
The development of high-throughput sequencing technologies has advanced our understanding of cancer. However, characterizing somatic structural variants in tumor genomes is still challenging because current strategies depend on the initial alignment of reads to a reference genome. Here, we describe SMUFIN (somatic mutation finder), a single program that directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution. Performance tests on modeled tumor genomes showed average sensitivity of 92% and 74% for SNVs and structural variants, with specificities of 95% and 91%, respectively. Analyses of aggressive forms of solid and hematological tumors revealed that SMUFIN identifies breakpoints associated with chromothripsis and chromoplexy with high specificity. SMUFIN provides an integrated solution for the accurate, fast and comprehensive characterization of somatic sequence variation in cancer.The recent development of high-throughput sequencing technologies has made possible the sequencing of genomes at an unprecedented speed, allowing the identification of the genetic basis of numerous diseases. These advances have been particularly important in the study of cancer, providing information on thousands of tumor genomes and a large catalog of genomic alteration associated with oncogenesis 1 .The characterization of somatic variation in tumor samples is, therefore, rapidly becoming a standard practice in biomedicine 2 . In a large fraction of biomedical studies that rely on high-throughput sequencing, the production of genome sequence data exceeds available computer resources and the capabilities of analytic protocols. This is particularly pertinent in the field of cancer genomics, where the increasing sequencing of tumor genomes calls for faster and more accurate analyses.The identification of somatic variants associated with cancer typically requires sequencing tumor and normal genome samples from the same patient, followed by multiple sequence comparisons. Normal and pathological reads are aligned to a reference genome, and the alignment is used to identify sequence changes to isolate the somatic fraction of variants (i.e., those detected only in the tumor). In principle, this simple strategy can be used to detect single-nucleotide variants (SNVs) and structural variants. Existing methods for the detection of somatic SNVs show high sensitivity and specificity 3,4 , but identifying structural variants is still challenging and remains largely unsolved. The need for a reference sequence is particularly limiting. Reads carrying variations, such as those covering somatic changes in the tumor, are more difficult to align to the reference genome 5 , and corresponding variants might become undetectable. Moreover, reference-based methods also must discriminate germline changes from somatic variants. In addition to these limitations at detection level, this a...
Motivation: The prediction and annotation of the genomic regions involved in gene expression has been largely explored. Most of the energy has been devoted to the development of approaches that detect transcription start sites, leaving the identification of regulatory regions and their functional transcription factor binding sites (TFBSs) largely unexplored and with important quantitative and qualitative methodological gaps.Results: We have developed ReLA (for REgulatory region Local Alignment tool), a unique tool optimized with the Smith–Waterman algorithm that allows local searches of conserved TFBS clusters and the detection of regulatory regions proximal to genes and enhancer regions. ReLA's performance shows specificities of 81 and 50% when tested on experimentally validated proximal regulatory regions and enhancers, respectively.Availability: The source code of ReLA's is freely available and can be remotely used through our web server under http://www.bsc.es/cg/rela.Contact: david.torrents@bsc.esSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.