The development of tools that provide early triage of COVID-19 patients with minimal use of diagnostic tests, based on easily accessible data, can be of vital importance in reducing COVID-19 mortality rates during high-incidence scenarios. This work proposes a machine learning model to predict mortality and risk of hospitalization using both 2 simple demographic features and 19 comorbidities obtained from 86,867 electronic medical records of COVID-19 patients, and a new method (LR-IPIP) designed to deal with data imbalance problems. The model was able to predict with high accuracy (90–93%, ROC-AUC = 0.94) the patient's final status (deceased or discharged), while its accuracy was medium (71–73%, ROC-AUC = 0.75) with respect to the risk of hospitalization. The most relevant characteristics for these models were age, sex, number of comorbidities, osteoarthritis, obesity, depression, and renal failure. Finally, to facilitate its use by clinicians, a user-friendly website has been developed (https://alejandrocisterna.shinyapps.io/PROVIA).
Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified in hereditary ataxia, a heterogeneous group of disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants in more than 300 genes have been described, leading to a detailed genetic classification partitioned by age-of-onset. Despite these advances, up to 75% of patients with ataxia remain molecularly undiagnosed even following whole genome sequencing, as exemplified in the 100,000 Genomes Project. This study aimed to understand whether we can improve our knowledge of the genetic architecture of hereditary ataxia by leveraging functional genomic annotations, and as a result, generate insights and strategies that raise the diagnostic yield. To achieve these aims, we used publicly-available multi-omics data to generate 294 genic features, capturing information relating to a gene’s structure, genetic variation, tissue-specific, cell-type-specific and temporal expression, as well as protein products of a gene. We studied these features across genes typically causing childhood-onset, adult-onset or both types of disease first individually, then collectively. This led to the generation of testable hypotheses which we investigated using whole genome sequencing data from up to 2,182 individuals presenting with ataxia and 6,658 non-neurological probands recruited in the 100,000 Genomes Project. Using this approach, we demonstrated a high short tandem repeat (STR) density within childhood-onset genes suggesting that we may be missing pathogenic repeat expansions within this cohort. This was verified in both childhood- and adult-onset ataxia patients from the 100,000 Genomes Project who were unexpectedly found to have a trend for higher repeat sizes even at naturally-occurring STRs within known ataxia genes, implying a role for STRs in pathogenesis. Using unsupervised analysis, we found significant similarities in genomic annotation across the gene panels, which suggested adult- and childhood-onset patients should be screened using a common diagnostic gene set. We tested this within the 100,000 Genomes Project by assessing the burden of pathogenic variants among childhood-onset genes in adult-onset patients and vice versa. This demonstrated a significantly higher burden of rare, potentially pathogenic variants in conventional childhood-onset genes among individuals with adult-onset ataxia. Our analysis has implications for the current clinical practice in genetic testing for hereditary ataxia. We suggest that the diagnostic rate for hereditary ataxia could be increased by removing the age-of-onset partition, and through a modified screening for repeat expansions in naturally-occurring STRs within known ataxia-associated genes, in effect treating these regions as candidate pathogenic loci.
The severe acute respiratory syndrome coronavirus (SARS-CoV-2) causing coronavirus disease 2019 (COVID-19) is highly transmissible and has been responsible for a pandemic associated with a high number of deaths. The clinical management of patients and the optimal use of resources are two important factors in reducing this mortality, especially in scenarios of high incidence. To this end, it is necessary to develop tools that allow early triage of patients with the minimal use of diagnostic tests and based on readily accessible data, such as electronic medical records. This work proposes the use of a machine learning model that allows the prediction of mortality and risk of hospitalization using simple demographic characteristics and comorbidities, using a COVID-19 dataset of 86867 patients. In addition, we developed a new method designed to deal with data imbalance problems. The model was able to predict with high accuracy (89-93%, ROC-AUC = 0.94) the patient's final status (expired/discharged) and with medium accuracy the risk of hospitalization (71-73%, ROC-AUC = 0.75). These models were obtained by assembling and using easily obtainable clinical characteristics (2 demographic characteristics and 19 predictors of comorbidities). The most relevant features of these models were the following patient characteristics: age, sex, number of comorbidities, osteoarthritis, obesity, depression, and renal failure.
Gene set based phenotype enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) can improve the rate of genetic diagnoses amongst other research purposes. To facilitate diverse phenotype analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases. PhenoExam achieves these tasks by integrating databases or resources such as the HPO, MGD, CRISPRbrain, CTD, ClinGen, CGI, OrphaNET, UniProt, PsyGeNET, and Genomics England Panel App. PhenoExam accepts both human and mouse genes as input. We developed PhenoExam to assist a variety of users, including clinicians, computational biologists and geneticists. It can be used to support the validation of new gene-to-disease discoveries, and in the detection of differential phenotypes between two gene sets (a phenotype linked to one of the gene set but no to the other) that are useful for differential diagnosis and to improve genetic panels. We validated PhenoExam performance through simulations and its application to real cases. We demonstrate that PhenoExam is effective in distinguishing gene sets or Mendelian diseases with very similar phenotypes through projecting the disease-causing genes into their annotation-based phenotypic spaces. We also tested the tool with early onset Parkinson's disease and dystonia genes, to show phenotype-level similarities but also potentially interesting differences. More specifically, we used PhenoExam to validate computationally predicted new genes potentially associated with epilepsy. Therefore, PhenoExam effectively discovers links between phenotypic terms across annotation databases through effective integration. The R package is available at https://github.com/alexcis95/PhenoExam and the Web tool is accessible at https://snca.atica.um.es/PhenoExamWeb/.
Genome-wide association studies (GWAS) have increased our understanding of Parkinson's disease (PD) genetics through the identification of common disease-associated variants. However, much of the heritability remains unaccounted for and we hypothesized that this could be partly explained by epistasis. Here, we developed a genome-wide non-exhaustive epistasis screening pipeline called Variant-variant interaction through variable thresholds (VARI3) and applied it to diverse PD GWAS cohorts. First, as a discovery cohort, we used 14 cohorts of European ancestry (14,671 cases and 17,667 controls) to identify candidate variant-variant interactions. Next, we replicated significant results in a cohort with a predominately Latino genetic ancestry (807 cases and 690 controls). We identified 14 significant epistatic signals in the discovery stage, with genes showing enrichment in PD-relevant ontologies and pathways. Next, we successfully replicated two of the 14 interactions, where the signals were located nearby SNCA and within MAPT and WNT3. Finally, we determined that the epistatic effect on PD of those variants was similar between populations. In brief, we identified several epistatic signals associated with PD and replicated associations despite differences in the genetic ancestry between cohorts. We also observed their biological relevance and effect on the phenotype using in silico analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.