BackgroundIt has been shown that the entities in everyday clinical text are often expressed in a way that varies from how they are expressed in the nomenclature. Owing to lots of synonyms, abbreviations, medical jargons or even misspellings in the daily used physician notes in clinical information system (CIS), the terminology without enough synonyms may not be adequately suitable for the task of Chinese clinical term recognition.MethodsThis paper demonstrates a validated system to retrieve the Chinese term of clinical finding (CTCF) from CIS and map them to the corresponding concepts of international clinical nomenclature, such as SNOMED CT. The system focuses on the SNOMED CT with Chinese synonyms enrichment (SCCSE). The literal similarity and the diagnosis-related similarity metrics were used for concept mapping. Two CTCF recognition methods, the rule- and terminology-based approach (RTBA) and the conditional random field machine learner (CRF), were adopted to identify the concepts in physician notes. The system was validated against the history of present illness annotated by clinical experts. The RTBA and CRF could be combined to predict new CTCFs besides SCCSE persistently.ResultsAround 59,000 CTCF candidates were accepted as valid and 39,000 of them occurred at least once in the history of present illness. 3,729 of them were accordant with the description in referenced Chinese clinical nomenclature, which could cross map to other international nomenclature such as SNOMED CT. With the hybrid similarity metrics, another 7,454 valid CTCFs (synonyms) were succeeded in concept mapping. For CTCF recognition in physician notes, a series of experiments were performed to find out the best CRF feature set, which gained an F-score of 0.887. The RTBA achieved a better F-score of 0.919 by the CTCF dictionary created in this research.ConclusionsThis research demonstrated that it is feasible to help the SNOMED CT with Chinese synonyms enrichment based on physician notes in CIS. With continuous maintenance of SCCSE, the CTCFs could be precisely retrieved from free text, and the CTCFs arranged in semantic hierarchy of SNOMED CT could greatly improve the meaningful use of electronic health record in China. The methodology is also useful for clinical synonyms enrichment in other languages.