PurposeThe purpose of this study was to develop a national program for Canadian diagnostic laboratories to compare DNA-variant interpretations and resolve discordant-variant classifications using the BRCA1 and BRCA2 genes as a case study.MethodsBRCA1 and BRCA2 variant data were uploaded and shared through the Canadian Open Genetics Repository (COGR; http://www.opengenetics.ca). A total of 5,554 variant observations were submitted; classification differences were identified and comparison reports were sent to participating laboratories. Each site had the opportunity to reclassify variants. The data were analyzed before and after the comparison report process to track concordant- or discordant-variant classifications by three different models.ResultsVariant-discordance rates varied by classification model: 38.9% of variants were discordant when using a five-tier model, 26.7% with a three-tier model, and 5.0% with a two-tier model. After the comparison report process, the proportion of discordant variants dropped to 30.7% with the five-tier model, to 14.2% with the three-tier model, and to 0.9% using the two-tier model.ConclusionWe present a Canadian interinstitutional quality improvement program for DNA-variant interpretations. Sharing of variant knowledge by clinical diagnostic laboratories will allow clinicians and patients to make more informed decisions and lead to better patient outcomes.
Conditions and thresholds applied for evidence weighting of within-codon concordance (PM5) for pathogenicity vary widely between laboratories and expert groups. Because of the sparseness of available clinical classifications, there is little evidence for variation in practice. Methods: We used as a truthset 7541 dichotomous functional classifications of BRCA1 and MSH2, spanning 311 codons of BRCA1 and 918 codons of MSH2, generated from large-scale functional assays that have been shown to correlate excellently with clinical classifications. We assessed PM5 at 5 stringencies with incorporation of 8 in silico tools. For each analysis, we quantified a positive likelihood ratio (pLR, true positive rate/false positive rate), the predictive value of PM5-lookup in ClinVar compared with the functional truthset. Results: pLR was 16.3 (10.6-24.9) for variants for which there was exactly 1 additional colocated deleterious variant on ClinVar, and the variant under examination was equally or more damaging when analyzed using BLOSUM62. pLR was 71.5 (37.8-135.3) for variants for which there were 2 or more colocated deleterious ClinVar variants, and the variant under examination was equally or more damaging than at least 1 colocated variant when analyzed using BLOSUM62. Conclusion: These analyses support the graded use of PM5, with potential to use it at higher evidence weighting where more stringent criteria are met.
There are >2500 different genetically-determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical.Here, we present a scalable, automated method for extraction of categorical phenotypic descriptors from full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-83% precision and 72-81% recall. Mean terms per paper increased from 9 in title + abstract, to 69 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than gold standard manually-curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. AUC for ROC curves increased by 5-10% through use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.