Background Real-world data, such as claims, electronic medical records (EMRs), and electronic health records (EHRs), are increasingly being used in clinical epidemiology. Understanding the current status of existing approaches can help in designing high-quality epidemiological studies. Objective We conducted a comprehensive narrative literature review to clarify the secondary use of claims, EMRs, and EHRs in clinical epidemiology in Japan. Methods We searched peer-reviewed publications in PubMed from January 1, 2006, to June 30, 2021 (the date of search), which met the following 3 inclusion criteria: involvement of claims, EMRs, EHRs, or medical receipt data; mention of Japan; and published from January 1, 2006, to June 30, 2021. Eligible articles that met any of the following 6 exclusion criteria were filtered: review articles; non–disease-related articles; articles in which the Japanese population is not the sample; articles without claims, EMRs, or EHRs; full text not available; and articles without statistical analysis. Investigations of the titles, abstracts, and full texts of eligible articles were conducted automatically or manually, from which 7 categories of key information were collected. The information included organization, study design, real-world data type, database, disease, outcome, and statistical method. Results A total of 620 eligible articles were identified for this narrative literature review. The results of the 7 categories suggested that most of the studies were conducted by academic institutes (n=429); the cohort study was the primary design that longitudinally measured outcomes of proper patients (n=533); 594 studies used claims data; the use of databases was concentrated in well-known commercial and public databases; infections (n=105), cardiovascular diseases (n=100), neoplasms (n=78), and nutritional and metabolic diseases (n=75) were the most studied diseases; most studies have focused on measuring treatment patterns (n=218), physiological or clinical characteristics (n=184), and mortality (n=137); and multivariate models were commonly used (n=414). Most (375/414, 90.6%) of these multivariate modeling studies were performed for confounder adjustment. Logistic regression was the first choice for assessing many of the outcomes, with the exception of hospitalization or hospital stay and resource use or costs, for both of which linear regression was commonly used. Conclusions This literature review provides a good understanding of the current status and trends in the use of claims, EMRs, and EHRs data in clinical epidemiology in Japan. The results demonstrated appropriate statistical methods regarding different outcomes, Japan-specific trends of disease areas, and the lack of use of artificial intelligence techniques in existing studies. In the future, a more precise comparison of relevant domestic research with worldwide research will be conducted to clarify the Japan-specific status and challenges.
BACKGROUND The secondary use of medical claims data, electronic medical records (EMRs), and electronic health records (EHRs) for clinical epidemiology research is overgrowing in Japan. Because these data are not collected for research purposes, secondary use requires understanding their limitations and the ability to generate clinical questions, epidemiological skills to construct a study design, and statistical skills to analyze retrospective observational data. Previous approaches have guided the limitations and challenges of using these data in observational clinical epidemiology research. However, knowledge of statistical skills for secondary use of these data is also essential. Therefore, we performed an exhaustive literature review of the nationwide existing studies based on these data to clarify how these data were applied in clinical epidemiological research. OBJECTIVE With an investigation of the existing studies based on claims, EMRs, and EHRs data in Japan, we aimed to learn: (1) what statistical methods were used; (2) in what disease areas were these data being used; (3) how frequently these data types were used; (4) which databases were used; (5) what kind of studies were designed; (6) whether these studies were conducted by academic institutions; and (7) what outcomes were assessed. METHODS We obtained articles based on claims, EMRs, and EHRs data by searching PubMed up to June 30, 2021 (the date of search). Eligible articles were then filtered based on the inclusion and exclusion criteria. Finally, we manually extracted the seven categories of information from full-texts of the target articles. RESULTS Results collected from the 620 target articles suggested that (1) most of the studies have been done by academic institutes (69.2%); (2) cohort study was the primary design that longitudinally measured outcomes of proper patients (86%), (3) 95.8% of studies have used claims data; (4) the JMDC (29.2%), DPC database (MHLW) (22.7%), MDV (16.6%), and NDB (10.5%) were the most used; (5) infections (16.9%), cardiovascular diseases (16.1%), neoplasms (12.6%), and nutritional and metabolic diseases (12.1%) were the most studied; (6) treatment patterns (35.2%), physiological/clinical (29.7%) and mortality (22.1%) were the most assessed outcomes; (7) multivariate models were commonly used (66.8%). In those studies which multivariate models were implemented, most of them were done for confounder adjustment (90.6%). Logistic regression was shown to be the first choice for assessing many of the outcomes, with exception of hospitalization/hospital stay and resource use/costs, for both of which linear regression was commonly used. In addition, some studies used propensity analysis to balance patient backgrounds between groups, from which we found a tendency for propensity score analysis to assess patient mortality. CONCLUSIONS Our findings provided a good view of the current status and trends in statistically analyzing these data in clinical epidemiology research. We also expected that these results would serve as reference information to help researchers design appropriate studies for secondary use of claims, EMRs, and EHRs data in clinical research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.