The development and innovation of next generation sequencing (NGS) and the subsequent analysis tools have gain popularity in scientific researches and clinical diagnostic applications. Hence, a systematic comparison of the sequencing platforms and variant calling pipelines could provide significant guidance to NGS-based scientific and clinical genomics. In this study, we compared the performance, concordance and operating efficiency of 27 combinations of sequencing platforms and variant calling pipelines, testing three variant calling pipelines—Genome Analysis Tool Kit HaplotypeCaller, Strelka2 and Samtools-Varscan2 for nine data sets for the NA12878 genome sequenced by different platforms including BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants calling performance of 12 combinations in WES datasets, all combinations displayed good performance in calling SNPs, with their F-scores entirely higher than 0.96, and their performance in calling INDELs varies from 0.75 to 0.91. And all 15 combinations in WGS datasets also manifested good performance, with F-scores in calling SNPs were entirely higher than 0.975 and their performance in calling INDELs varies from 0.71 to 0.93. All of these combinations manifested high concordance in variant identification, while the divergence of variants identification in WGS datasets were larger than that in WES datasets. We also down-sampled the original WES and WGS datasets at a series of gradient coverage across multiple platforms, then the variants calling period consumed by the three pipelines at each coverage were counted, respectively. For the GIAB datasets on both BGI and Illumina platforms, Strelka2 manifested its ultra-performance in detecting accuracy and processing efficiency compared with other two pipelines on each sequencing platform, which was recommended in the further promotion and application of next generation sequencing technology. The results of our researches will provide useful and comprehensive guidelines for personal or organizational researchers in reliable and consistent variants identification.
In the past decade, treatments for tumors have made remarkable progress, such as the successful clinical application of targeted therapies. Nowadays, targeted therapies are based primarily on the detection of mutations, and next-generation sequencing (NGS) plays an important role in relevant clinical research. The mutation frequency is a major problem in tumor mutation detection and increasing sequencing depth is a widely used method to improve mutation calling performance. Therefore, it is necessary to evaluate the effect of different sequencing depth and mutation frequency as well as mutation calling tools. In this study, Strelka2 and Mutect2 tools were used in detecting the performance of 30 combinations of sequencing depth and mutation frequency. Results showed that the precision rate kept greater than 95% in most of the samples. Generally, for higher mutation frequency (≥20%), sequencing depth ≥200X is sufficient for calling 95% mutations; for lower mutation frequency (≤10%), we recommend improving experimental method rather than increasing sequencing depth. Besides, according to our results, although Strelka2 and Mutect2 performed similarly, the former performed slightly better than the latter one at higher mutation frequency (≥20%), while Mutect2 performed better when the mutation frequency was lower than 10%. Besides, Strelka2 was 17 to 22 times faster than Mutect2 on average. Our research will provide a useful and comprehensive guideline for clinical genomic researches on somatic mutation identification through systematic performance comparison among different sequencing depths and mutation frequency.
PurposeBreast cancer is the most commonly occurring cancer among women worldwide, and therefore, improved approaches for its early detection are urgently needed. As microRNAs (miRNAs) are increasingly recognized as critical regulators in tumorigenesis and possess excellent stability in plasma, this study focused on using miRNAs to develop a method for identifying noninvasive biomarkers.MethodsTo discover critical candidates, differential expression analysis was performed on tissue-originated miRNA profiles of 409 early breast cancer patients and 87 healthy controls from The Cancer Genome Atlas database. We selected candidates from the differentially expressed miRNAs and then evaluated every possible molecular signature formed by the candidates. The best signature was validated in independent serum samples from 113 early breast cancer patients and 47 healthy controls using reverse transcription quantitative real-time polymerase chain reaction.ResultsThe miRNA candidates in our method were revealed to be associated with breast cancer according to previous studies and showed potential as useful biomarkers. When validated in independent serum samples, the area under curve of the final miRNA signature (miR-21-3p, miR-21-5p, and miR-99a-5p) was 0.895. Diagnostic sensitivity and specificity were 97.9% and 73.5%, respectively.ConclusionThe present study established a novel and effective method to identify biomarkers for early breast cancer. And the method, is also suitable for other cancer types. Furthermore, a combination of three miRNAs was identified as a prospective biomarker for breast cancer early detection.
Background Accumulating evidences demonstrated that microRNA-target gene pairs were closely related to tumorigenesis and development. However, the correlation between miRNA and target gene was insufficiently understood, especially its changes between tumor and normal tissues. Objectives The aim of this study was to evaluate the changes of correlation of miRNAs-target pairs between normal and tumor. Materials and Methods 5680 mRNA and 5740 miRNA expression profiles of 11 major human cancers were downloaded from the Cancer Genome Atlas (TCGA). The 11 cancer types were bladder urothelial carcinoma, breast invasive carcinoma, head and neck squamous cell carcinoma, kidney chromophobe, kidney renal clear cell carcinoma, kidney renal papillary cell carcinoma, liver hepatocellular carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, stomach adenocarcinoma, and thyroid carcinoma. For each cancer type, we firstly obtained differentially expressed miRNAs (DEMs) and genes (DEGs) in tumor and then acquired critical miRNA-target gene pairs by combining DEMs, DEGs and two experimentally validated miRNA-target interaction databases, miRTarBase and miRecords. We collected samples with both miRNA and mRNA expression values and performed a correlation analysis by Pearson method for miRNA-target pairs in normal and tumor, respectively. Results We totally got 4743 critical miRNA-target pairs across 11 cancer types, and 4572 of them showed weaker correlation in tumor than in normal. The average correlation coefficients of miRNA-target pairs were different greatly between normal (-0.38 ~ -0.61) and tumor (-0.04 ~ -0.26) for 11 cancer type. The pan-cancer network, which consisted of 108 edges connecting 35 miRNAs and 89 target genes, showed the interactions of pairs appeared in multicancers. Conclusions This comprehensive analysis revealed that correlation between miRNAs and target genes was greatly reduced in tumor and these critical pairs we got were involved in cellular adhesion, proliferation, and migration. Our research could provide opportunities for investigating cancer molecular regulatory mechanism and seeking therapeutic targets.
Breast cancer, the most common cancer in women worldwide, is associated with high mortality. The long non‐coding RNAs (lncRNAs) with a little capacity of coding proteins is playing an increasingly important role in the cancer paradigm. Accumulating evidences demonstrate that lncRNAs have crucial connections with breast cancer prognosis while the studies of lncRNAs in breast cancer are still in its primary stage. In this study, we collected 1052 clinical patient samples, a comparatively large sample size, including 13 159 lncRNA expression profiles of breast invasive carcinoma (BRCA) from The Cancer Genome Atlas database to identify prognosis‐related lncRNAs. We randomly separated all of these clinical patient samples into training and testing sets. In the training set, we performed univariable Cox regression analysis for primary screening and played the model for Robust likelihood‐based survival for 1000 times. Then 11 lncRNAs with a frequency more than 600 were selected for prediction of the prognosis of BRCA. Using the analysis of multivariate Cox regression, we established a signature risk‐score formula for 11 lncRNA to identify the relationship between lncRNA signatures and overall survival. The 11 lncRNA signature was validated both in the testing and the complete set and could effectively classify the high‐/low‐risk group with different OS. We also verified our results in different stages. Moreover, we analyzed the connection between the 11 lncRNAs and the genes of ESR1, PGR, and Her2, of which protein products (ESR, PGR, and HER2) were used to classify the breast cancer subtypes widely. The results indicated correlations between 11 lncRNAs and the gene of PGR and ESR1. Thus, a prognostic model for 11 lncRNA expression was developed to classify the BRAC clinical patient samples, providing new avenues in understanding the potential therapeutic methods of breast cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.