Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.
Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from high throughput sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here, we rigorously compare the performance of 9 HLA callers on 652 RNA-seq samples across 6 datasets with molecularly defined gold standard. We find that OptiType has the highest accuracy at both low and high resolution with an accuracy above 99%, followed by arcasHLA and seq2HLA with accuracies above 96%. Despite OptiType’s high accuracy, it is only capable of Class I predictions, thereby limiting its application to clinical procedures like transplantation requiring Class II predictions. Furthermore, our findings reveal significant variation in accuracy across each HLA locus, with HLA-A exhibiting the highest accuracy and HLA-DRB1 exhibiting the lowest accuracy. We also find that class II genes are generally more challenging to impute than class I genes, with most typing algorithms capable of making Class I predictions to >97% accuracy whereas the best Class II tool predicts with 94.2% accuracy. Moreover, we identify notable differences in the computational resources necessary to run each tool. We find that the most computationally expensive tools are OptiType and HLA-HD which require 105and 102times greater RAM and CPU, respectively, than the least computationally expensive tools, seq2HLA and RNA2HLA. Furthermore, all tools have decreased accuracy for African samples with respect to European samples at four digit resolution. We conclude that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy, consistency, and computational expensiveness are yet to be developed.
The scientific community has accumulated enormous amounts of genomic data stored in specialized public repositories. Genomic data is easily accessible and available from public genomic repositories allowing the biomedical community to effectively share the omics datasets. However, improperly annotated or incomplete metadata accompanying the raw omics data can negatively impact the utility of shared for secondary analysis. In this study, we perform a comprehensive analysis under 137 studies over 18,559 samples across six therapeutics fields to assess the completeness of metadata accompanying omics studies in both publication and its related online repositories across and make observations about how the process of data sharing could be made reliable. This analysis involved an initial literature survey in finding studies based on the seven therapeutic fields, that are Alzheimer’s disease, acute myeloid leukemia, cystic fibrosis, cardiovascular diseases, inflammatory bowel disease, sepsis, and tuberculosis. We carefully examined the availability of metadata over nine clinical variables, that included disease condition, age, organism, sex, tissue type, ethnicity, country, mortality, and clinical severity. By comparing the metadata availability in both original publications and online repositories, we observed discrepancies in sharing the metadata. We determine that the overall availability of metadata is 72.8%, where the most complete reported phenotypes are disease condition and organism, and the least is mortality. Additionally, we examined the completeness of metadata reported separately in original publications and online repositories. The completeness of metadata from the original publication across the nine clinical phenotypes is 71.1%. In contrast, the overall completeness of metadata information from the public repositories is 48.6%. Our study is the first one to assess the completeness of metadata accompanying raw data across a large number of studies and phenotypes and opens a crucial discussion about solutions to improve completeness and accessibility of metadata accompanying omics studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.