Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e−8 and 1.5e−9 per nucleotide per generation for SNVs and indels, respectively.
Background Many mortality prediction models have been developed for patients in intensive care units (ICUs); most are based on data available at ICU admission. We investigated whether machine learning methods using analyses of time-series data improved mortality prognostication for patients in the ICU by providing real-time predictions of 90-day mortality. In addition, we examined to what extent such a dynamic model could be made interpretable by quantifying and visualising the features that drive the predictions at different timepoints.Methods Based on the Simplified Acute Physiology Score (SAPS) III variables, we trained a machine learning model on longitudinal data from patients admitted to four ICUs in the Capital Region, Denmark, between 2011 and 2016. We included all patients older than 16 years of age, with an ICU stay lasting more than 1 h, and who had a Danish civil registration number to enable 90-day follow-up. We leveraged static data and physiological time-series data from electronic health records and the Danish National Patient Registry. A recurrent neural network was trained with a temporal resolution of 1 h. The model was internally validated using the holdout method with 20% of the training dataset and externally validated using previously unseen data from a fifth hospital in Denmark. Its performance was assessed with the Matthews correlation coefficient (MCC) and area under the receiver operating characteristic curve (AUROC) as metrics, using bootstrapping with 1000 samples with replacement to construct 95% CIs. A Shapley additive explanations algorithm was applied to the prediction model to obtain explanations of the features that drive patient-specific predictions, and the contributions of each of the 44 features in the model were analysed and compared with the variables in the original SAPS III model. Findings From a dataset containing 15 615 ICU admissions of 12 616 patients, we included 14 190 admissions of 11 492 patients in our analysis. Overall, 90-day mortality was 33⋅1% (3802 patients). The deep learning model showed a predictive performance on the holdout testing dataset that improved over the timecourse of an ICU stay: MCC 0⋅29 (95% CI 0⋅25-0⋅33) and AUROC 0⋅73 (0⋅71-0⋅74) at admission, 0⋅43 (0⋅40-0⋅47) and 0⋅82 (0⋅80-0⋅84) after 24 h, 0⋅50 (0⋅46-0⋅53) and 0⋅85 (0⋅84-0⋅87) after 72 h, and 0⋅57 (0⋅54-0⋅60) and 0⋅88 (0⋅87-0⋅89) at the time of discharge. The model exhibited good calibration properties. These results were validated in an external validation cohort of 5827 patients with 6748 admissions: MCC 0⋅29 (95% CI 0⋅27-0⋅32) and AUROC 0⋅75 (0⋅73-0⋅76) at admission, 0⋅41 (0⋅39-0⋅44) and 0⋅80 (0⋅79-0⋅81) after 24 h, 0⋅46 (0⋅43-0⋅48) and 0⋅82 (0⋅81-0⋅83) after 72 h, and 0⋅47 (0⋅44-0⋅49) and 0⋅83 (0⋅82-0⋅84) at the time of discharge.Interpretation The prediction of 90-day mortality improved with 1-h sampling intervals during the ICU stay. The dynamic risk prediction can also be explained for an individual patient, visualising the features contributing to the prediction at any point in ...
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits 1-4 . Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly 2,5-7 . However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology 4,8-13 . We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.Using a combination of high-depth (average 78× ) Illumina pairedend and mate-pair libraries, we applied Allpaths-LG 14 to create de novo assemblies of high quality and coverage for each of the 150 individuals with a median scaffold N50 of ~ 21 megabases (Mb; maximum ~ 30 Mb) (Supplementary Table 1). The 100 largest scaffolds in each of the 140 best assemblies typically covered more than 75% (median 77%, Extended Data Fig. 1a) of the genome, with the largest scaffolds exceeding 110 Mb in size (Supplementary Table 1). To evaluate the accuracy of the assemblies, we subsequently aligned the scaffolds for each individual to the human reference genome (GRCh38) 15 . Figure 1 shows an example individual where the euchromatic part of each chromosome was almost completely covered by a few large scaffolds and in several cases scaffolds covered almost entire chromosome arms. Only rarely did we find that large scaffolds break and align to more than one chromosome (Extended Data Fig. 1b), suggesting that even the largest scaffolds are seldom chimaeric. We also compared our de novo assemblies with a published long-read assembly based on BioNano mapping and PacBio sequencing 16 . Extended Data Figs 2a and 3 show that this assembly was less complete than our assemblies, but with similar scaffold lengths. The long-read assembly had 5.38% missing data compared with our median of 4.25% (Extended Data Fig. 3a), but the missing data in our assemblies were found in smaller gaps (Extended Data Fig. 3b, c), and the median contig length was therefore much smaller th...
Current standard treatments for metastatic colorectal cancer (CRC) are based on combination regimens with one of the two chemotherapeutic drugs, irinotecan or oxaliplatin. However, drug resistance frequently limits the clinical efficacy of these therapies. In order to gain new insights into mechanisms associated with chemoresistance, and departing from three distinct CRC cell models, we generated a panel of human colorectal cancer cell lines with acquired resistance to either oxaliplatin or irinotecan. We characterized the resistant cell line variants with regards to their drug resistance profile and transcriptome, and matched our results with datasets generated from relevant clinical material to derive putative resistance biomarkers. We found that the chemoresistant cell line variants had distinctive irinotecan- or oxaliplatin-specific resistance profiles, with non-reciprocal cross-resistance. Furthermore, we could identify several new, as well as some previously described, drug resistance-associated genes for each resistant cell line variant. Each chemoresistant cell line variant acquired a unique set of changes that may represent distinct functional subtypes of chemotherapy resistance. In addition, and given the potential implications for selection of subsequent treatment, we also performed an exploratory analysis, in relevant patient cohorts, of the predictive value of each of the specific genes identified in our cellular models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.