Siyue Yang scite author profile

Siyue Yang

4Publications

19Citation Statements Received

644Citation Statements Given

How they've been cited

How they cite others

356

643

Affiliations

University of Toronto

Publications

Order By: Most citations

Machine learning approaches for electronic health records phenotyping: a methodical review

Yang

Varghese

Stephenson

et al. 2022

View full text Add to dashboard Cite

Objective Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. Materials and methods We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. Results Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. Discussion Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. Conclusion Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

show abstract

Machine Learning Approaches for Electronic Health Records Phenotyping: A Methodical Review

Yang

Varghese²,

Stephenson

et al. 2022

Preprint

View full text Add to dashboard Cite

ObjectiveAccurate and rapid methods for phenotyping are a prerequisite to realizing the potential of electronic health records (EHRs) data for clinical and translational research. This study reviews the literature on machine learning (ML) approaches for phenotyping with respect to the phenotypes considered, the data sources and methods used, and the contributions within the wider context of EHR-based research.Materials and MethodsWe searched for relevant articles in PubMed and Web of Science published between January 1, 2018 and April 14, 2022. After screening, we collected data on 52 variables across 106 selected articles.ResultsML-based methods were developed for 156 unique phenotypes, primarily using EHR data from a single institution or health system. 72 of 106 articles leveraged unstructured data in clinical notes. In terms of methodology, supervised learning is the most prevalent ML paradigm (n = 64, 60.4%), with half of the articles employing deep learning. Semi-supervised and weakly-supervised approaches were applied to reduce the burden of obtaining gold-standard labeled data (n = 21, 19.8%), while unsupervised learning was used for phenotype discovery (n = 20, 18.9%). Federated learning has been applied to develop algorithms across multiple institutions while preserving data privacy (n = 2, 1.9%).DiscussionWhile the use of ML for phenotyping is growing, most articles applied traditional supervised ML to characterize the presence of common, chronic conditions.ConclusionContinued research in ML-based methods is warranted, with particular attention to the development of advanced methods for complex phenotypes and standards for reporting and evaluating phenotyping algorithms.

show abstract

Detection of dextran, maltodextrin and soluble starch in the adulterated Lycium barbarum polysaccharides (LBPs) using Fourier-transform infrared spectroscopy (FTIR) and machine learning models

Chen

Yang

Nan

et al. 2023

Heliyon

View full text Add to dashboard Cite

On some problems of Bayesian region construction with guaranteed coverages

et al. 2023

View full text Add to dashboard Cite

The general problem of constructing regions that have a guaranteed coverage probability for an arbitrary parameter of interest $$\psi \in \Psi $$ ψ ∈ Ψ is considered. The regions developed are Bayesian in nature and the coverage probabilities can be considered as Bayesian confidences with respect to the model obtained by integrating out the nuisance parameters using the conditional prior given $$\psi .$$ ψ . Both the prior coverage probability and the prior probability of covering a false value (the accuracy) can be controlled by setting the sample size. These coverage probabilities are considered as a priori figures of merit concerning the reliability of a study while the inferences quoted are Bayesian. Several problems are considered where obtaining confidence regions with desirable properties have proven difficult to obtain. For example, it is shown that the approach discussed never leads to improper regions which has proven to be an issue for some confidence regions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Siyue Yang

Machine learning approaches for electronic health records phenotyping: a methodical review

Machine Learning Approaches for Electronic Health Records Phenotyping: A Methodical Review

Detection of dextran, maltodextrin and soluble starch in the adulterated Lycium barbarum polysaccharides (LBPs) using Fourier-transform infrared spectroscopy (FTIR) and machine learning models

On some problems of Bayesian region construction with guaranteed coverages

Contact Info

Product

Resources

About