Cancer is one of the most important health issues globally and the accuracy of interpretation of cancer‐related variants is critical for the clinical management of hereditary cancer. ClinGen Sequence Variant Interpretation Working Groups have developed many adaptations of American College of Medical Genetics and Genomics and the Association of Molecular Pathologists guidelines to improve the consistency of interpretation. We combined the most recent adaptations to expand the number of the criteria from 28 to 48 and developed a tool called Cancer SIGVAR to help genetic counselors interpret the clinical significance of cancer germline variants. Our tool can accept VCF files as input and realize fully automated interpretation based on 21 criteria and semiautomated interpretation based on 48 criteria. We validated the performance of our tool with the ClinVar and CLINVITAE benchmark databases, achieving an average consistency for pathogenic and benign assessment up to 93.71% and 79.38%, respectively. We compared Cancer SIGVAR with two similar tools, InterVar and PathoMAN, and analyzed the main differences in criteria and implementation. Furthermore, we selected 911 variants from another two in‐house benchmark databases, and semiautomated interpretation reached an average classification consistency of 98.35%. Our findings highlight the need to optimize automated interpretation tools based on constantly updated guidelines. Cancer SIGVAR is publicly available at http://cancersigvar.bgi.com/.
The American College of Medical Genetics and Genomics and the Association for Molecular Pathology published guidelines in 2015 for the clinical interpretation of Mendelian disorder sequence variants based on 28 criteria. ClinGen Sequence Variant Interpretation (SVI) Working Groups have developed many adaptations or refinements of these guidelines to improve the consistency of interpretation. We combined the most recent adaptations to expand the criteria from 28 to 48 and developed a tool called Cancer SIGVAR to help healthcare workers and genetic counselors interpret the clinical significance of cancer germline variants, which is critical for the clinical diagnosis and treatment of hereditary cancer. Our tool can accept VCF files as input and realize fully automated interpretation based on 21 criteria and semi-automated interpretation based on 48 criteria. We validated our tool on the ClinVar and CLINVITAE benchmark databases for the accuracy of fully automated interpretation, achieving an average consistency for pathogenic and benign assessment up to 93.40% and 82.54%, respectively. We compared Cancer SIGVAR with a similar tool, InterVar, and analyzed the main differences in criteria and implementation. In addition, to verify the performance of semi-automated interpretation based on 48 criteria, we selected 911 variants from two benchmark databases and reached an average classification consistency of 98.35%. Our findings highlight the need to optimize automated interpretation tools based on constantly updated guidelines.
A combined high-quality manual annotation and deep-learning natural language processing study is reported to make accurate name entity recognition (NER) for biomedical literatures. A home-made version of entity annotation guidelines on biomedical literatures was constructed. Our manual annotations have an overall over 92% consistency for all the four entity types such as gene, variant, disease and species with the same publicly available annotated corpora from other experts previously. A total of 400 full biomedical articles from PubMed are annotated based on our home-made entity annotation guidelines. Both a BERT-based large model and a DistilBERT-based simplified model were constructed, trained and optimized for offline and online inference, respectively. The F1-scores of NER of gene, variant, disease and species for the BERT-based model are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those for the DistilBERT-based model are 95.14%, 86.26%, 91.37% and 89.92%, respectively. The F1 scores of the DistilBERT-based NER model retains 97.8%, 92.2%, 98.7% and 93.9% of those of BERT-based NER for gene, variant, disease and species, respectively. Moreover, the performance for both our BERT-based NER model and DistilBERT-based NER model outperforms that of the state-of-art model,BioBERT, indicating the significance to train an NER model on biomedical-domain literatures jointly with high-quality annotated datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.