SummaryPolygenic risk scores (PRS) aim to quantify the contribution of multiple genetic loci to an individual’s likelihood of a complex trait or disease. However, existing PRS estimate genetic liability using common genetic variants, excluding the impact of rare variants. We identified rare, large-effect variants in individuals with outlier gene expression from the GTEx project and then assessed their impact on PRS predictions in the UK Biobank (UKB). We observed large deviations from the PRS-predicted phenotypes for carriers of multiple outlier rare variants; for example, individuals classified as “low-risk” but in the top 1% of outlier rare variant burden had a 6-fold higher rate of severe obesity. We replicated these findings using data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) biobank and the Million Veteran Program, and demonstrated that PRS across multiple traits will significantly benefit from the inclusion of rare genetic variants.
6Mobile genetic elements contribute to bacterial adaptation and evolution; however, detecting 7 these elements in a high-throughput and unbiased manner remains challenging. Here, we demonstrate a 8 de novo approach to identify mobile elements from short-read sequencing data. The method identifies 9 the precise site of mobile element insertion and infers the identity of the inserted sequence. This is an 10 improvement over previous methods that either rely on curated databases of known mobile elements or 11 rely on 'split-read' alignments that assume the inserted element exists within the reference genome. We 12 apply our approach to 12,419 sequenced isolates of nine prevalent bacterial pathogens, and we identify 13 hundreds of known and novel mobile genetic elements, including many candidate insertion sequences. 14 We find that the mobile element repertoire and insertion rate vary considerably across species, and that 15 many of the identified mobile elements are biased toward certain target sequences, several of them being 16 highly specific. Mobile element insertion hotspots often cluster near genes involved in mechanisms of 17 antibiotic resistance, and such insertions are associated with antibiotic resistance in laboratory 18 experiments and clinical isolates. Finally, we demonstrate that mutagenesis caused by these mobile 19 elements contributes to antibiotic resistance in a genome-wide association study of mobile element 20 insertions in pathogenic Escherichia coli. In summary, by applying a de novo approach to precisely identify 21 elements, such as insertion sequences, MGEs can contain additional "passenger" genes ( Figure 1e). These 45 passenger genes can code for a variety of proteins, including virulence factors, antibiotic resistance genes, 46 detoxifying agents, and enzymes for secondary metabolism (Rankin, Rocha, and Brown 2011). 47 on shared homology with known MGEs or genes, or by requiring well-annotated reference genomes using 69 sequenced isolates that closely resemble the reference. 70Here, we sought to comprehensively identify complete MGEs, the genes they contained, and their 71 site of insertion with respect to a reference genome from short-read sequencing data. Our approach is 72 flexible, as it can use a database of MGEs when available, but it does not depend entirely on a database 73 of known MGEs and mobile genes. It is sensitive and precise enough to be used on both laboratory 74 samples and environmental isolates. We focus on MGE insertion sites, generate consensus sequences for 75 inserted elements from the clipped ends of locally-aligned reads, infer complete MGEs from sequence 76 assemblies, and build a de novo database of elements across all analyzed samples. We combine several 77 sequence inference approaches to identify large insertions, resulting in a highly sensitive and precise 78 overall approach. 79 By focusing on MGE insertion sites, we answer several questions about these elements that have 80 not been thoroughly addressed in the past. For example, we determine the target...
BackgroundIdentification of causal genes for polygenic human diseases has been extremely challenging, and our understanding of how physiological and pharmacological stimuli modulate genetic risk at disease-associated loci is limited. Specifically, insulin resistance (IR), a common feature of cardiometabolic disease, including type 2 diabetes, obesity, and dyslipidemia, lacks well-powered GWAS, and therefore few associated loci and causal genes have been identified.ResultsHere, we perform and integrate LD-adjusted colocalization analyses across nine cardiometabolic traits combined with eQTLs and sQTLs from five metabolically relevant human tissues (subcutaneous and visceral adipose, skeletal muscle, liver, and pancreas). We identify 470 colocalized loci and prioritize 207 loci with a single colocalized gene. To elucidate upstream regulators and functional mechanisms for these genes, we integrate their transcriptional responses to 21 physiological and pharmacological cardiometabolic regulators in human adipocytes, hepatocytes, and skeletal muscle cells, and map their protein-protein interactions.ConclusionsOur use of transcriptional responses under metabolic perturbations to contextualize genetic associations from our state-of-the-art colocalization approach provides a list of likely causal genes and their upstream regulators in the context of IR-associated cardiometabolic risk.
Butyrate is a four-carbon fatty acid produced in large quantities by bacteria found in the human gut. It is the major source of colonic epithelial cell energy, can bind to and agonize short-chain fatty acid G-protein coupled receptors and functions as a histone deacetylase (HDAC) inhibitor. Anti-cancer effects of butyrate are attributed to a global increase in histone acetylation in colon cancer cells; however, the role that corresponding chromatin remodeling plays in this effect is not fully understood. We used longitudinal paired ATAC-seq and RNA-seq on HCT-116 colon cancer cells to determine how butyrate-related chromatin changes functionally associate with cancer. We detected distinct temporal changes in chromatin accessibility in response to butyrate with less accessible regions enriched in transcription factor binding motifs and distal enhancers. These regions significantly overlapped with regions maintained by the SWI/SNF chromatin remodeler, and were further enriched amongst chromatin regions that are associated with ARID1A/B synthetic lethality. Finally, we found that butyrate-induced chromatin regions were enriched for both colorectal cancer GWAS loci and somatic mutations in cancer. These results demonstrate the convergence of both somatic mutations and GWAS risk variants for colon cancer within butyrate-responsive chromatin regions, providing a molecular map of the mechanisms by which this microbial metabolite might confer anti-cancer properties.
Recent work performed by Sberro et al. (2019) revealed a vast unexplored space of small proteins existing within the human microbiome. At present, these small open reading frames (smORFs) are unannotated in existing reference genomes and standard genome annotation tools are not able to accurately predict them. In this study, we introduce an annotation tool named SmORFinder that predicts small proteins based on those identified by Sberro et al. This tool combines profile Hidden Markov models (pHMMs) of each small protein family and deep learning models that may better generalize to smORF families not seen in the training set. We find that combining predictions of both pHMM and deep learning models leads to more precise smORF predictions and that these predicted smORFs are enriched for Ribo-Seq or MetaRibo-Seq translation signals. Feature importance analysis reveals that the deep learning models learned to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codons in a way that strongly corresponds to the codon synonyms found in the codon table. We perform a core genome analysis of 26 bacterial species and identify many core smORFs of unknown function. We pre-compute small protein annotations for thousands of RefSeq isolate genomes and HMP metagenomes, and we make these data available through a web portal along with other useful tools for small protein annotation and analysis. The systematic identification and annotation of those important small proteins will help researchers to expand our understanding of this exciting field of biology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.