Although renal hyperfiltration (RHF) or an abnormal increase in GFR has been associated with many lifestyles and clinical conditions, including diabetes, its clinical consequence is not clear. RHF is frequently considered to be the result of overestimating true GFR in subjects with muscle wasting. To evaluate the association between RHF and mortality, 43,503 adult Koreans who underwent voluntary health screening at Seoul National University Hospital between March of 1995 and May of 2006 with baseline GFR$60 ml/min per 1.73 m 2 were followed up for mortality until December 31, 2012. GFR was estimated with the Chronic Kidney Disease Epidemiology Collaboration creatinine equation, and RHF was defined as GFR.95th percentile after adjustment for age, sex, muscle mass, and history of diabetes and/or hypertension medication.Muscle mass was measured with bioimpedance analysis at baseline. During the median follow-up of 12.4 years, 1743 deaths occurred. The odds ratio of RHF in participants with the highest quartile of muscle mass was 1.31 (95% confidence interval [95% CI], 1.11 to 1.54) compared with the lowest quartile after adjusting for confounding factors, including body mass index. The hazard ratio of all-cause mortality for RHF was 1.37 (95% CI, 1.11 to 1.70) by Cox proportional hazards model with adjustment for known risk factors, including smoking. These data suggest RHF may be associated with increased all-cause mortality in an apparently healthy population. The possibility of RHF as a novel marker of all-cause mortality should be confirmed. Although CKD is a well known risk factor for all-cause or cardiovascular mortality, 1 the clinical consequences of an abnormally high GFR or renal hyperfiltration (RHF) have not been adequately evaluated. On the basis of several cross-sectional studies, RHF is known to be associated with various medical conditions, such as diabetes, 2,3 hypertension, 4 obesity, 5 prehypertension, and prediabetes, 6 as well as lifestyle factors, such as smoking, 7 lack of physical activity, 8 and low aerobic fitness. 9 Although these conditions are well known risk factors for early mortality, 10 the clinical implications of RHF remain unclear.Several cohort studies and meta-analyses have reported a J-shaped association between GFR and all-cause and cardiovascular mortality. However, the increased mortality associated with a higher GFR was commonly regarded as an overestimation of GFR because of muscle wasting in a high-risk group. [11][12][13][14][15][16][17] The disappearance of the J-shaped association between GFR and mortality within a younger age group in the higher GFR range is considered as supporting evidence for the overestimation of the true GFR
BackgroundPathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports.MethodsWe designed a new data model for representing biomarker knowledge. The automated system parses immunohistochemistry reports based on a “slide paragraph” unit defined as a set of immunohistochemistry findings obtained for the same tissue slide. Pathology reports are parsed using context-free grammar for immunohistochemistry, and using a tree-like structure for surgical pathology. The performance of the approach was validated on manually annotated pathology reports of 100 randomly selected patients managed at Seoul National University Hospital.ResultsHigh F-scores were obtained for parsing biomarker name and corresponding test results (0.999 and 0.998, respectively) from the immunohistochemistry reports, compared to relatively poor performance for parsing surgical pathology findings. However, applying the proposed approach to our single-center dataset revealed information on 221 unique biomarkers, which represents a richer result than biomarker profiles obtained based on the published literature. Owing to the data representation model, the proposed approach can associate biomarker profiles extracted from an immunohistochemistry report with corresponding pathology findings listed in one or more surgical pathology reports. Term variations are resolved by normalization to corresponding preferred terms determined by expanded dictionary look-up and text similarity-based search.ConclusionsOur proposed approach for biomarker data extraction addresses key limitations regarding data representation and can handle reports prepared in the clinical setting, which often contain incomplete sentences, typographical errors, and inconsistent formatting.Electronic supplementary materialThe online version of this article (10.1186/s12911-018-0609-7) contains supplementary material, which is available to authorized users.
Background Common data models (CDMs) help standardize electronic health record data and facilitate outcome analysis for observational and longitudinal research. An analysis of pathology reports is required to establish fundamental information infrastructure for data-driven colon cancer research. The Observational Medical Outcomes Partnership (OMOP) CDM is used in distributed research networks for clinical data; however, it requires conversion of free text–based pathology reports into the CDM’s format. There are few use cases of representing cancer data in CDM. Objective In this study, we aimed to construct a CDM database of colon cancer–related pathology with natural language processing (NLP) for a research platform that can utilize both clinical and omics data. The essential text entities from the pathology reports are extracted, standardized, and converted to the OMOP CDM format in order to utilize the pathology data in cancer research. Methods We extracted clinical text entities, mapped them to the standard concepts in the Observational Health Data Sciences and Informatics vocabularies, and built databases and defined relations for the CDM tables. Major clinical entities were extracted through NLP on pathology reports of surgical specimens, immunohistochemical studies, and molecular studies of colon cancer patients at a tertiary general hospital in South Korea. Items were extracted from each report using regular expressions in Python. Unstructured data, such as text that does not have a pattern, were handled with expert advice by adding regular expression rules. Our own dictionary was used for normalization and standardization to deal with biomarker and gene names and other ungrammatical expressions. The extracted clinical and genetic information was mapped to the Logical Observation Identifiers Names and Codes databases and the Systematized Nomenclature of Medicine (SNOMED) standard terminologies recommended by the OMOP CDM. The database-table relationships were newly defined through SNOMED standard terminology concepts. The standardized data were inserted into the CDM tables. For evaluation, 100 reports were randomly selected and independently annotated by a medical informatics expert and a nurse. Results We examined and standardized 1848 immunohistochemical study reports, 3890 molecular study reports, and 12,352 pathology reports of surgical specimens (from 2017 to 2018). The constructed and updated database contained the following extracted colorectal entities: (1) NOTE_NLP, (2) MEASUREMENT, (3) CONDITION_OCCURRENCE, (4) SPECIMEN, and (5) FACT_RELATIONSHIP of specimen with condition and measurement. Conclusions This study aimed to prepare CDM data for a research platform to take advantage of all omics clinical and patient data at Seoul National University Bundang Hospital for colon cancer pathology. A more sophisticated preparation of the pathology data is needed for further research on cancer genomics, and various types of text narratives are the next target for additional research on the use of data in the CDM.
Background Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date. Objective We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports. Methods Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data. Results The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%. Conclusion As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer–specific data for retrospective observational research and participate in multicenter studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.