Next-generation sequencing (NGS) is rapidly expanding into routine oncology practice. Genetic variations in both the cancer and inherited genomes are informative for hereditary cancer risk, prognosis, and treatment strategies. Herein, we focus on the clinical perspective of integrating NGS results into patient care to assist with therapeutic decision making. Five key considerations are addressed for operationalization of NGS testing and application of results to patient care as follows: (1) NGS test ordering and workflow design; (2) result reporting, curation, and storage; (3) clinical consultation services that provide test interpretations and identify opportunities for molecularly guided therapy; (4) presentation of genetic information within the electronic health record; and (5) education of providers and patients. Several of these key considerations center on informatics tools that support NGS test ordering and referencing back to the results for therapeutic purposes. Clinical decision support tools embedded within the electronic health record can assist with NGS test utilization and identifying opportunities for targeted therapy including clinical trial eligibility. Challenges for project and change management in operationalizing NGS-supported, evidence-based patient care in the context of current information technology systems with appropriate clinical data standards are discussed, and solutions for overcoming barriers are provided.
PURPOSE The use of genomics within cancer research and clinical oncology practice has become commonplace. Efforts such as The Cancer Genome Atlas have characterized the cancer genome and suggested a wealth of targets for implementing precision medicine strategies for patients with cancer. The data produced from research studies and clinical care have many potential secondary uses beyond their originally intended purpose. Effective storage, query, retrieval, and visualization of these data are essential to create an infrastructure to enable new discoveries in cancer research. METHODS Moffitt Cancer Center implemented a molecular data warehouse to complement the extensive enterprise clinical data warehouse (Health and Research Informatics). Seven different sequencing experiment types were included in the warehouse, with data from institutional research studies and clinical sequencing. RESULTS The implementation of the molecular warehouse involved the close collaboration of many teams with different expertise and a use case–focused approach. Cornerstones of project success included project planning, open communication, institutional buy-in, piloting the implementation, implementing custom solutions to address specific problems, data quality improvement, and data governance, unique aspects of which are featured here. We describe our experience in selecting, configuring, and loading molecular data into the molecular data warehouse. Specifically, we developed solutions for heterogeneous genomic sequencing cohorts (many different platforms) and integration with our existing clinical data warehouse. CONCLUSION The implementation was ultimately successful despite challenges encountered, many of which can be generalized to other research cancer centers.
Introduction: Much of the information in electronic medical records (EMRs) required for the practice of clinical oncology is contained in unstructured text. While natural language processing (NLP) has been used to extract information from EMR text, accuracy is suboptimal. In late 2018 a powerful new deep-learning NLP algorithm was published: Bidirectional Encoder Representations from Transformers (BERT). BERT set new accuracy records and for the first time achieved human-level performance on several NLP benchmarks. Our goal was to train BERT to extract clinically relevant data from pathology reports with high accuracy. Procedures: Like many cancer centers nationwide, Moffitt Cancer Center employs Certified Tumor Registrars (CTRs) to collect and report data about cancer patients to state and federal agencies. The CTR extracted data are labels that identify, with high accuracy, important information in each pathology report. Consequently, we used this data to tune BERT to perform a question-and-answering (Q&A) task. Our system sought the answers to 2 predetermined questions in each pathology report: “What organ contains the tumor?”, and “What is the kind of tumor or carcinoma?” To achieve this, we matched surgical pathology reports created at Moffitt from January 1, 2007 onwards with structured data extracted by CTRs. The resulting dataset was randomly divided into training (80%) and testing (20%) subsets. After Q&A training, model performance was assessed using the test dataset. Two metrics were calculated for each question: a true-or-false indication of a perfect word-for-word match between the BERT-extracted data and CTR-extracted data; and, the F1 statistic. The latter produces a value between 0% and 100% indicating the degree of overlap between words in the BERT-extracted data and words in the CTR-extracted data. Results: The final dataset contained 14,143 pathology reports (11,520 for training, 2,623 for testing). This dataset included tumors from 228 organ sites involving 232 histological classifications. The three most common organ sites / histological classifications were: Prostate Gland / Adenocarcinoma (6.7%); Breast / Invasive Carcinoma (6.1%); and, Breast Overlapping Lesion / Invasive Carcinoma (5.9%). Our BERT-based Q&A system searched for answers to both questions in each test report. Thus, a total of 5,246 answers were generated. Of these, 4,667 (89%) were a perfect word-for-word match with the corresponding CTR extracted phrases. The mean F1 statistic between the BERT answers and the CTR extracted phrases was 92%. Conclusions: Future efforts will focus on improving performance via unsupervised training of the BERT language model using 484,000 Moffitt pathology reports. We will also extract additional data fields with CTR-matched ground truth labels. Ultimately new NLP transformer models could aid extraction of information from pathology reports and other EMR documents. This, in turn, could greatly facilitate personalized medicine. Citation Format: Ross Mitchell, Rachel Howard, Patricia Lewis, Katie Fellows, Jennie Jones, Phillip Reisman, Brooke Fridley, Dana Rollison. Deep learning for automatic extraction of tumor site and histology from unstructured pathology reports [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 2101.
The vast wealth of medical data collected over the last decade holds great promise for accelerating novel research, discovery, and clinical translation. Specifically, the rapid expansion of genomic testing provides new opportunities for the clinical management of cancer patients, influencing diagnosis, risk stratification, and treatment planning. Moffitt Cancer Center's Personalized Medicine Clinical Service integrates next-generation sequencing test results into patient care, using the data to guide individualized treatment plans. To maximize the efficiency and efficacy of this service, creative solutions for data harmonization, storage, and management are required. We implemented a commercial molecular data warehouse (MDW), directly linked to our existing clinical data warehouse, to store and manage molecular data ranging from genotypic alterations to annotations from public resources (HUGO, COSMIC, Ensembl) and clinically actionable targets (4,256 records currently loaded). A centralized, cloud-based data and analytics platform is also being implemented at Moffitt that will integrate a broad range of multi-modal data. In the cloud environment, the data from the MDW will be linked to typically siloed data streams from the electronic health record, cancer registry management system, biospecimen management system, billing and scheduling systems, patient-reported information and outcomes, and patient-generated health data, creating a unique and customized Personalized Medicine Curated Data Mart (CDM). In addition to describing the features of the MDW and the challenges faced during its implementation, we will provide an overview of the extensive data cleaning and curation required to facilitate such a CDM. This includes the extraction of disease characteristics from unstructured clinical text via natural language processing, creation of new derived data fields, approaches to extracting and managing complex treatment data, and the inclusion of detailed, manually-abstracted recurrence and outcomes data for historical patients from existing institutional datasets such as the Clinical Genomics Action Committee (CGAC) database. Finally, we will present prototypes of analytics dashboards that will interface directly with our CDM, facilitating intuitive data exploration for all members of our personalized medicine teams. Citation Format: Rachel Howard, Kevin Hicks, Jamie Teer, Phillip Reisman, Mandy O'Leary, Steven Eschrich, Ross Mitchell, Howard McLeod, Dana Rollison. Facilitating personalized medicine with cloud-based storage and analytics [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 3226.
The goal of the described work was to create a self-service tool for data exploration and cohort creation, facilitating feasibility assessments for research studies and providing insights into the primary clinical and demographic characteristics of the Moffitt patient population. The Moffitt Cancer Analytics Platform (MCAP) was designed to enable seamless end-to-end data lifecycle management, promote data democratization, and reduce data duplication by feeding diverse, typically siloed local data streams into a central data repository. Eight distinct sources including the Moffitt electronic medical record, cancer registry, billing systems and clinical trial management system feed into an Amazon S3-based data lake. Raw data are cleaned, standardized for research use, and stored in the enterprise data warehouse in Snowflake. To date, over 80 distinct tables and 2,200 data elements are captured, representing over 500,000 patients. To provide transparency into these data assets, Moffitt partnered with phData to create a custom tool for data self-service, MCAP Explore. Built on the Sigma cloud analytic platform, Explore is a highly customized interactive workbook that allows the user to filter the total patient population on a broad range of clinical and demographic characteristics to identify cohorts of interest, with applications ranging from research study feasibility assessments to operational oversight of the composition of the patient population and where disparities may exist in disease characteristics, treatment or outcomes. In addition, the user is able to view a broad range of visualizations summarizing the resulting cohort and deep-dive into the specific records available. The features that distinguish Explore from similar tools include: a) the broad range of data domains covered by the available filters (demographics, diagnoses, appointments, labs/vitals, treatment, study enrollment/consent, patient-reported, biospecimens, molecular); b) the ability to apply any number of filters in an arbitrary order, on data that is both one-to-one and many-to-one with the patient, while maintaining important linkages between record types; c) full customization of the product for our oncology-specific use cases, facilitating transparency into the underlying data model and flexibility for future expansion; d) no data being required to leave the Moffitt environment, and de-identification processes governed at an institutional level and applied within Snowflake being automatically inherited within Explore by connection to our institutional Active Directory accounts and Snowflake user roles. We will present detailed examples of tool functionality and describe the underlying data model and design decisions required to accommodate a broad range of complex oncology-specific use cases. In addition, we will present early usage metrics and next steps for expansion into new data domains including imaging. Citation Format: Rachel Howard, Phillip Reisman, Patricia Lewis, Rodrigo Carvajal, Chandan Challa, Mark Ruesink, Joe McFarren, Katrina Johnson, Mukund Sridhar, Kedar Kulkarni, Dana E. Rollison. MCAP Explore: A self-service data exploration and cohort building tool for oncologists [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2068.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.