Shicheng Li scite author profile

Shicheng Li

5Publications

2Citation Statements Received

40Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation

Li¹,

Zhang²,

Chen³

et al. 2022

J Med Internet Res

View full text Add to dashboard Cite

Background Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep-phenotyping method for non-English EHRs (ie, Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data that are suitable for developing deep-phenotyping methods are limited. It is challenging to develop a deep-phenotyping method for Chinese EHRs in such a low-resource scenario. Objective In this study, we aimed to develop a deep-phenotyping method with good generalization ability for Chinese EHRs based on limited fine-grained annotation data. Methods The core of the methodology was to identify linguistic patterns of phenotype descriptions in Chinese EHRs with a sequence motif discovery tool and perform deep phenotyping of Chinese EHRs by recognizing linguistic patterns in free text. Specifically, 1000 Chinese EHRs were manually annotated based on a fine-grained information model, PhenoSSU (Semantic Structured Unit of Phenotypes). The annotation data set was randomly divided into a training set (n=700, 70%) and a testing set (n=300, 30%). The process for mining linguistic patterns was divided into three steps. First, free text in the training set was encoded as single-letter sequences (P: phenotype, A: attribute). Second, a biological sequence analysis tool—MEME (Multiple Expectation Maximums for Motif Elicitation)—was used to identify motifs in the single-letter sequences. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep-phenotyping method for Chinese EHRs, including a deep learning–based method for named entity recognition and a pattern recognition–based method for attribute prediction. Results In total, 51 sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions could be learned from a mean of 134 (SD 9.7) annotated EHRs in the training set. The deep-phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1 score of 0.898 with the Bidirectional Encoder Representations from Transformers–bidirectional long short-term memory and conditional random field model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern–based method. Conclusions We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non–English-speaking countries.

show abstract

Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation (Preprint)

Li¹,

Zhang²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

BACKGROUND Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep phenotyping method for non-English EHRs (such as Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data suitable for developing deep phenotyping methods are limited. It is a great challenge to develop a deep phenotyping method for Chinese EHRs in such a low-resource scenario. OBJECTIVE In the study, we aimed to develop a deep phenotyping method with good generalization ability for Chinese EHRs based on limited fine-grained annotation data. METHODS The core of the methodology was to learn linguistic patterns of phenotype descriptions in Chinese EHRs with a sequence motif discovery tool and then perform deep phenotyping of Chinese EHRs by recognizing learned linguistic patterns in free text. Specifically, 1,000 Chinese EHRs were manually annotated based on a fine-grained information model, PhenoSSU (the Semantic Structured Unit of Phenotypes). The annotation dataset was randomly divided into a training set (70%) and a testing set (30%). The process for mining linguistic patterns could be divided into three steps: First, free text in the training set was encoded as a single-letter sequence (P: phenotype, A: attribute). Second, a biological sequence analysis tool named MEME motif discovery was used to identify motifs in the single-letter sequence. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep phenotyping method for Chinese EHRs, including a deep learning–based model for named entity recognition and a pattern recognition-based method for attribute prediction. RESULTS Fifty-one sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions might be learned from 134 (+/−9.7) annotated EHRs in the training set. The deep phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1-score of 0.898 with the BERT-BiLSTM-CRF model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern-based method. CONCLUSIONS We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non-English-speaking countries.

show abstract

Correction: Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study (Preprint)

Deng¹,

Chen²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

UNSTRUCTURED In “Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study” (J Med Internet Res 2021;23(6):e26892) the authors noted one error. The institution name of affiliation “Suzhou Institute of Systems Medicine” was not correct. It should be corrected from “Suzhou Institute of Systems Medicine” to “Center of Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College; Suzhou Institute of Systems Medicine”

show abstract

Correction: Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study

Deng¹,

Chen²,

Yang³

et al. 2021

J Med Internet Res

View full text Add to dashboard Cite

Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study (Preprint)

Deng¹,

Chen²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

BACKGROUND Phenotypes characterize clinical manifestations of disease, which provide important information for diagnosis. Therefore, constructing phenotype knowledge graphs of disease is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs, because they only consider core concepts of phenotypes but neglects details (attributes) associated with phenotypes. OBJECTIVE To characterize details of disease phenotypes in clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (Semantic Structured Unit of Phenotypes). METHODS PhenoSSU is an "entity-attribute-value" model by its very nature, which aims to capture full semantics underlying phenotype descriptions with a series of attributes and values. 193 clinical guidelines of infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether a PhenoSSU instance could capture full semantic underlying the corresponding phenotype description. To automatically construct fine-grained phenotype knowledge graphs, A hybrid strategy that firstly recognized phenotype concepts with the MetaMap tool and then predicted attribute values of phenotypes with machine learning classifiers was developed. RESULTS Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. It was found that the PhenoSSU model could precisely represent 89.5% (3757/4020) of phenotype descriptions in clinical guidelines. By comparison, other information models such as the Clinical Element Model and the HL7 FHIR model could only capture full semantics underlying 48.4% and 21.8% of phenotype descriptions, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction. CONCLUSIONS PhenoSSU is an effective information model for the precise representation of phenotype knowledge in clinical guidelines, and machine learning can be used to improve efficiency for constructing PhenoSSU-based knowledge graphs. Our work will potentially benefit knowledge-based systems for diagnosis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shicheng Li

Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation

Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation (Preprint)

Correction: Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study (Preprint)

Correction: Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study

Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study (Preprint)

Contact Info

Product

Resources

About