Transthyretin amyloid cardiomyopathy, an often unrecognized cause of heart failure, is now treatable with a transthyretin stabilizer. It is therefore important to identify at-risk patients who can undergo targeted testing for earlier diagnosis and treatment, prior to the development of irreversible heart failure. Here we show that a random forest machine learning model can identify potential wild-type transthyretin amyloid cardiomyopathy using medical claims data. We derive a machine learning model in 1071 cases and 1071 non-amyloid heart failure controls and validate the model in three nationally representative cohorts (9412 cases, 9412 matched controls), and a large, single-center electronic health record-based cohort (261 cases, 39393 controls). We show that the machine learning model performs well in identifying patients with cardiac amyloidosis in the derivation cohort and all four validation cohorts, thereby providing a systematic framework to increase the suspicion of transthyretin cardiac amyloidosis in patients with heart failure.
BackgroundTransposition is disruptive in nature and, thus, it is imperative for host genomes to evolve mechanisms that suppress the activity of transposable elements (TEs). At the same time, transposition also provides diverse sequences that can be exapted by host genomes as functional elements. These notions form the basis of two competing hypotheses pertaining to the role of epigenetic modifications of TEs in eukaryotic genomes: the genome defense hypothesis and the exaptation hypothesis. To date, all available evidence points to the genome defense hypothesis as the best explanation for the biological role of TE epigenetic modifications.ResultsWe evaluated several predictions generated by the genome defense hypothesis versus the exaptation hypothesis using recently characterized epigenetic histone modification data for the human genome. To this end, we mapped chromatin immunoprecipitation sequence tags from 38 histone modifications, characterized in CD4+ T cells, to the human genome and calculated their enrichment and depletion in all families of human TEs. We found that several of these families are significantly enriched or depleted for various histone modifications, both active and repressive. The enrichment of human TE families with active histone modifications is consistent with the exaptation hypothesis and stands in contrast to previous analyses that have found mammalian TEs to be exclusively repressively modified. Comparisons between TE families revealed that older families carry more histone modifications than younger ones, another observation consistent with the exaptation hypothesis. However, data from within family analyses on the relative ages of epigenetically modified elements are consistent with both the genome defense and exaptation hypotheses. Finally, TEs located proximal to genes carry more histone modifications than the ones that are distal to genes, as may be expected if epigenetically modified TEs help to regulate the expression of nearby host genes.ConclusionsWith a few exceptions, most of our findings support the exaptation hypothesis for the role of TE epigenetic modifications when vetted against the genome defense hypothesis. The recruitment of epigenetic modifications may represent an additional mechanism by which TEs can contribute to the regulatory functions of their host genomes.
Independent lines of investigation have documented effects of both transposable elements (TEs) and gene length (GL) on gene expression. However, TE gene fractions are highly correlated with GL, suggesting that they cannot be considered independently. We evaluated the TE environment of human genes and GL jointly in an attempt to tease apart their relative effects. TE gene fractions and GL were compared with the overall level of gene expression and the breadth of expression across tissues. GL is strongly correlated with overall expression level but weakly correlated with the breadth of expression, confirming the selection hypothesis that attributes the compactness of highly expressed genes to selection for economy of transcription. However, TE gene fractions overall, and for the L1 family in particular, show stronger anticorrelations with expression level than GL, indicating that GL may not be the most important target of selection for transcriptional economy. These results suggest a specific mechanism, removal of TEs, by which highly expressed genes are selectively tuned for efficiency. MIR elements are the only family of TEs with gene fractions that show a positive correlation with tissue-specific expression, suggesting that they may provide regulatory sequences that help to control human gene expression. Consistent with this notion, MIR fractions are relatively enriched close to transcription start sites and associated with coexpression in specific sets of related tissues. Our results confirm the overall relevance of the TE environment to gene expression and point to distinct mechanisms by which different TE families may contribute to gene regulation.
We analyzed the chicken (Gallus gallus) genome sequence to search for previously uncharacterized endogenous retrovirus (ERV) sequences using ab initio and combined evidence approaches. We discovered 11 novel families of ERVs that occupy more than 21 million base pairs, approximately 2%, of the chicken genome. These novel families include a number of recently active full-length elements possessing identical long terminal repeats (LTRs) as well as intact gag and pol open reading frames. The abundance and diversity of chicken ERVs we discovered underscore the utility of an approach that combines multiple methods for the identification of interspersed repeats in vertebrate genomes.Reviewers: This article was reviewed by Igor Zhulin and Itai Yanai. FindingsChicken, a modern descendant of the dinosaurs, is the first avian to have its genome sequenced [1]. Phylogenetically, its position between fish and mammals provides valuable insight into the evolution of vertebrates. The chicken genome has a size of 1.2 billion bases, approximately one third of the size of the human genome.The overall interspersed repeat, i.e. transposable element (TE), content of the chicken genome was determined to be less than 9% [1]. This fraction is considerably lower than that of mammalian genomes, where transposable elements (TEs) account for 40-50% of genomic sequences [2][3][4]. While chicken has long been a model system for the study of retroviruses [5], a mere 1.3% of the chicken genome can be classified as endogenous retroviruses (ERVs) compared to about 5% in humans [3]. Nevertheless, protein coding sequences still make up only a minor fraction of the chicken genome leaving a substantial quotient that has yet to be been accounted for. The authors of the initial analysis of the chicken genome posited that much of the uncharacterized sequence was likely to be derived from unrecognized TEs [1]. Indeed, novel or previously uncharacterized TE sequences may be missed by homology-based methods for the detection of repeats, such as the widely used RepeatMasker program [6], which rely on the comparison of genomic sequences to libraries of known repeat consensus sequences. Ab initio methods, on the other hand, identify repeats by virtue of their structural characteristics without regard to any sequence similarity to known elements. We used a combination of ab initio detection, sequence similarity searches, motif identification and evaluation of element structural (repeat) features to search for novel ERVs that may have been missed in the initial analysis of the chicken genome.LTR_STRUC was the first ab initio program designed to detect long terminal repeat (LTR) containing elements, such as ERVs, in genomic sequence [7]. Briefly, LTR_STRUC works by sliding a window along genomic
Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is widely used in biological research. ChIP-seq experiments yield many ambiguous tags that can be mapped with equal probability to multiple genomic sites. Such ambiguous tags are typically eliminated from consideration resulting in a potential loss of important biological information.Results: We have developed a Gibbs sampling-based algorithm for the genomic mapping of ambiguous sequence tags. Our algorithm relies on the local genomic tag context to guide the mapping of ambiguous tags. The Gibbs sampling procedure we use simultaneously maps ambiguous tags and updates the probabilities used to infer correct tag map positions. We show that our algorithm is able to correctly map more ambiguous tags than existing mapping methods. Our approach is also able to uncover mapped genomic sites from highly repetitive sequences that can not be detected based on unique tags alone, including transposable elements, segmental duplications and peri-centromeric regions. This mapping approach should prove to be useful for increasing biological knowledge on the too often neglected repetitive genomic regions.Availability: http://esbg.gatech.edu/jordan/software/mapContact: king.jordan@biology.gatech.eduSupplementary Information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.