BackgroundThe adoption of the electronic medical record (EMR) is rapidly growing in China. Constantly evolving, Chinese EMRs contain vast amounts of clinical and financial data, providing tremendous potential for research and policy use; however, they are only partially standardized and contain free text or unstructured data. To utilize the information contained in Chinese EMRs, the development of data extraction methodology is urgently needed. The purpose of this study is to develop and validate methods to extract clinical information from the Chinese EMR for research use.MethodsUsing 2010 to 2014 EMR data from YouAn Hospital, a large teaching hospital affiliated with Capital Medical University in Beijing, China, we developed extraction methods including 40 EMR definitions for defining 6 liver disease, 5 disease severity conditions, and 29 comorbidities and treatments. We conducted a chart review of 450 randomly selected EMRs. Using physician chart review results as a reference, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated to validate each EMR definition.ResultsThe sensitivity of the 6 EMR definitions for liver diseases ranged from 78.9 to 100.0 %, and PPV ranged from 82.1 to 100.0 %. The sensitivity of the 5 definitions on disease severity conditions ranged from 91.0 to 100.0 %, and PPV ranged from 79.2 to 100.0 %. Among the 29 EMR definitions for comorbidities and treatments, 23 had sensitivity over 90.0 % and 25 had PPV over 80.0 %. The specificity and NPV for all 40 EMR definitions were over 90.0 %.ConclusionThe extraction method developed is a valid way of extracting information on liver diseases, comorbidities and related treatments from YouAn hospital EMRs. Our method should be modified for application to other Chinese EMR systems, following our framework for extracting conditions.Electronic supplementary materialThe online version of this article (doi:10.1186/s12911-016-0348-6) contains supplementary material, which is available to authorized users.