The repair of chromosomal double strand breaks (DSBs) is crucial in the maintenance of genomic integrity. However, the repair of DSBs can also destabilize the genome by causing mutations and chromosomal rearrangements, the driving forces for carcinogenesis and hereditary diseases. Break induced replication (BIR) is one of the DSB repair pathways that is highly prone to genetic instability1–3. BIR proceeds by invasion of one broken end into a homologous DNA sequence followed by replication that can copy hundreds of kilobasepairs of DNA from a donor molecule all the way through its telomere4,5. The resulting repaired chromosome comes at a great cost to the cell, as BIR promotes mutagenesis, loss of heterozygosity, translocations, and copy number variations, all hallmarks of carcinogenesis4–9. BIR employs the majority of known replication proteins to copy large portions of DNA, similar to S-phase replication10,11. It has thus been suggested that BIR proceeds by semiconservative replication; however, the model of a bona-fide, stable replication fork contradicts the known instabilities associated with BIR such as a 1000-fold increase in mutation rate compared to normal replication9. Here we demonstrate that the mechanism of replication during BIR is significantly different from S-phase replication, as it proceeds via an unusual bubble-like replication fork that results in conservative inheritance of the new genetic material. We provide the evidence that this atypical mode of DNA replication, dependent on Pif1 helicase, is responsible for the dramatic increase in BIR-associated mutations. We propose that the BIR-mode of synthesis presents a powerful mechanism that can initiate bursts of genetic instability in eukaryotes including humans.
SUMMARY Complex genomic rearrangements (CGRs) are a hallmark of many human diseases. Recently, CGRs were suggested to result from microhomology-mediated break-induced replication (MMBIR), a replicative mechanism involving template switching at positions of microhomology. Currently, the cause of MMBIR and the proteins mediating this process remains unknown. Here, we demonstrate in yeast, that a collapse of homology-driven break-induced replication (BIR) caused by defective repair DNA synthesis in the absence of Pif1 helicase leads to template-switches involving 0–6 nucleotides of homology, followed by resolution of recombination intermediates into chromosomal rearrangements. Importantly, we show that these microhomology-mediated template-switches, indicative of MMBIR, are driven by translesion synthesis (TLS) polymerases Polζ and Rev1. Thus, an interruption of BIR involving fully homologous chromosomes in yeast triggers a switch to MMBIR catalyzed by TLS polymerases. Overall, our study provides important mechanistic insights into the initiation of MMBIR associated with genomic rearrangements, similar to those promoting diseases in humans.
Genetic instabilities, including mutations and chromosomal rearrangements, lead to cancer and other diseases in humans and play an important role in evolution. A frequent cause of genetic instabilities is double-strand DNA breaks (DSBs), which may arise from a wide range of exogeneous and endogeneous cellular factors. Although the repair of DSBs is required, some repair pathways are dangerous because they may destabilize the genome. One such pathway, break-induced replication (BIR), is the mechanism for repairing DSBs that possesses only one repairable end. This situation commonly arises as a result of eroded telomeres or collapsed replication forks. Although BIR plays a positive role in repairing DSBs, it can alternatively be a dangerous source of several types of genetic instabilities, including loss of heterozygosity, telomere maintenance in the absence of telomerase, and non-reciprocal translocations. Also, mutation rates in BIR are about 1000 times higher as compared to normal DNA replication. In addition, micro-homology-mediated BIR (MMBIR), which is a mechanism related to BIR, can generate copy-number variations (CNVs) as well as various complex chromosomal rearrangements. Overall, activation of BIR may contribute to genomic destabilization resulting in substantial biological consequences including those affecting human health.
Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
It is essential for the advancement of science that researchers share, reuse and reproduce each other’s workflows and protocols. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize the importance of making digital objects findable and reusable by others. The question of how to apply these principles not just to data but also to the workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe a two-fold approach of simultaneously applying the FAIR principles to scientific workflows as well as the involved data. We apply and evaluate our approach on the case of the PREDICT workflow, a highly cited drug repurposing workflow. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. We propose a semantic model to address these specific requirements and was evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.