Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e−8 and 1.5e−9 per nucleotide per generation for SNVs and indels, respectively.
Background Many mortality prediction models have been developed for patients in intensive care units (ICUs); most are based on data available at ICU admission. We investigated whether machine learning methods using analyses of time-series data improved mortality prognostication for patients in the ICU by providing real-time predictions of 90-day mortality. In addition, we examined to what extent such a dynamic model could be made interpretable by quantifying and visualising the features that drive the predictions at different timepoints.Methods Based on the Simplified Acute Physiology Score (SAPS) III variables, we trained a machine learning model on longitudinal data from patients admitted to four ICUs in the Capital Region, Denmark, between 2011 and 2016. We included all patients older than 16 years of age, with an ICU stay lasting more than 1 h, and who had a Danish civil registration number to enable 90-day follow-up. We leveraged static data and physiological time-series data from electronic health records and the Danish National Patient Registry. A recurrent neural network was trained with a temporal resolution of 1 h. The model was internally validated using the holdout method with 20% of the training dataset and externally validated using previously unseen data from a fifth hospital in Denmark. Its performance was assessed with the Matthews correlation coefficient (MCC) and area under the receiver operating characteristic curve (AUROC) as metrics, using bootstrapping with 1000 samples with replacement to construct 95% CIs. A Shapley additive explanations algorithm was applied to the prediction model to obtain explanations of the features that drive patient-specific predictions, and the contributions of each of the 44 features in the model were analysed and compared with the variables in the original SAPS III model. Findings From a dataset containing 15 615 ICU admissions of 12 616 patients, we included 14 190 admissions of 11 492 patients in our analysis. Overall, 90-day mortality was 33⋅1% (3802 patients). The deep learning model showed a predictive performance on the holdout testing dataset that improved over the timecourse of an ICU stay: MCC 0⋅29 (95% CI 0⋅25-0⋅33) and AUROC 0⋅73 (0⋅71-0⋅74) at admission, 0⋅43 (0⋅40-0⋅47) and 0⋅82 (0⋅80-0⋅84) after 24 h, 0⋅50 (0⋅46-0⋅53) and 0⋅85 (0⋅84-0⋅87) after 72 h, and 0⋅57 (0⋅54-0⋅60) and 0⋅88 (0⋅87-0⋅89) at the time of discharge. The model exhibited good calibration properties. These results were validated in an external validation cohort of 5827 patients with 6748 admissions: MCC 0⋅29 (95% CI 0⋅27-0⋅32) and AUROC 0⋅75 (0⋅73-0⋅76) at admission, 0⋅41 (0⋅39-0⋅44) and 0⋅80 (0⋅79-0⋅81) after 24 h, 0⋅46 (0⋅43-0⋅48) and 0⋅82 (0⋅81-0⋅83) after 72 h, and 0⋅47 (0⋅44-0⋅49) and 0⋅83 (0⋅82-0⋅84) at the time of discharge.Interpretation The prediction of 90-day mortality improved with 1-h sampling intervals during the ICU stay. The dynamic risk prediction can also be explained for an individual patient, visualising the features contributing to the prediction at any point in ...
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
PurposeTo establish a cohort that enables identification of genomic factors that influence human health and empower increased blood donor health and safe blood transfusions. Human health is complex and involves several factors, a major one being the genomic aspect. The genomic era has resulted in many consortia encompassing large samples sizes, which has proven successful for identifying genetic factors associated with specific traits. However, it remains a big challenge to establish large cohorts that facilitate studies of the interaction between genetic factors, environmental and life-style factors as these change over the course of life. A major obstacle to such endeavours is that it is difficult to revisit participants to retrieve additional information and obtain longitudinal, consecutive measurements.ParticipantsBlood donors (n=110 000) have given consent to participate in the Danish Blood Donor Study. The study uses the infrastructure of the Danish blood banks.Findings to dateThe cohort comprises extensive phenotype data and whole genome genotyping data. Further, it is possible to retrieve additional phenotype data from national registries as well as from the donors at future visits, including consecutive measurements.Future plansTo provide new knowledge on factors influencing our health and thus provide a platform for studying the influence of genomic factors on human health, in particular the interaction between environmental and genetic factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.