Learning From Clinical Consensus Diagnosis in India to Facilitate Automatic Classification of Dementia: Machine Learning Study

Jin, Haomiao; Chien, Sandy; Meijer, Erik; Khobragade, Pranali; Lee, Jinkook

doi:10.2196/27113

Cited by 11 publications

(11 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, we set the random seed at 234 with the set.seed function to separate the training and testing sets, and the microarrays were randomly divided into the training and testing sets in a ratio of 8:2. In fact, we found that the segmentation of the training and testing data has different division standards in the different articles, such as 90:10 [ 48 ], 85:15 [ 49 ], 80:20, 70:30 [ 50 ], 60:40 [ 51 ], and so on. We choose 8:2 for three reasons: first, it is supported by the reported literature; second, considering that the number of samples is not large enough, we wanted to make the training data for developing the model as large as possible; third, the ratio of the number of DILI samples and control samples is about 80:20.…”

Section: Discussionmentioning

confidence: 99%

Identification of Drug-Induced Liver Injury Biomarkers from Multiple Microarrays Based on Machine Learning and Bioinformatics Analysis

Wang

Zhang

et al. 2022

IJMS

View full text Add to dashboard Cite

Drug-induced liver injury (DILI) is the most common adverse effect of numerous drugs and a leading cause of drug withdrawal from the market. In recent years, the incidence of DILI has increased. However, diagnosing DILI remains challenging because of the lack of specific biomarkers. Hence, we used machine learning (ML) to mine multiple microarrays and identify useful genes that could contribute to diagnosing DILI. In this prospective study, we screened six eligible microarrays from the Gene Expression Omnibus (GEO) database. First, 21 differentially expressed genes (DEGs) were identified in the training set. Subsequently, a functional enrichment analysis of the DEGs was performed. We then used six ML algorithms to identify potentially useful genes. Based on receiver operating characteristic (ROC), four genes, DDIT3, GADD45A, SLC3A2, and RBM24, were identified. The average values of the area under the curve (AUC) for these four genes were higher than 0.8 in both the training and testing sets. In addition, the results of immune cell correlation analysis showed that these four genes were highly significantly correlated with multiple immune cells. Our study revealed that DDIT3, GADD45A, SLC3A2, and RBM24 could be biomarkers contributing to the identification of patients with DILI.

show abstract

Section: Discussionmentioning

confidence: 99%

Identification of Drug-Induced Liver Injury Biomarkers from Multiple Microarrays Based on Machine Learning and Bioinformatics Analysis

Wang

Zhang

et al. 2022

IJMS

View full text Add to dashboard Cite

show abstract

“…About 60% of LASI-DAD participants received clinical consensus diagnoses of their dementia status [16]. A machine learning model has been developed and validated to expand the clinical consensus diagnosis of dementia to all LASI-DAD participants [17].…”

Section: Methodsmentioning

confidence: 99%

“…The ultimate model was a stochastic gradient boosting model with an area under receiver operating characteristics (ROC) curve of 0.94, indicating that the model has an almost perfect discriminative ability according to thresholds suggested in prior research [19]. Details of the machine learning model have been described elsewhere [17]. To maximize the amount of labeled data for analysis in this paper, both the clinical consensus diagnoses and the predicted diagnoses in the LASI-DAD study were used as the labeled data to develop the semi-supervised machine learning model.…”

Section: Dementia Assessment and Diagnosismentioning

confidence: 99%

“…In this study, the initial predictive model was trained with a random selection of 70% labeled data from the LASI-DAD study using a stochastic gradient boosting method, the same machine learning method that has been shown in our previous study to generate accurate predictions for LASI-DAD participants [17]. The initial model was then iteratively retrained using the self-training function in the R "ssc" package [28].…”

Section: The Semi-supervised Learning Approachmentioning

confidence: 99%

See 1 more Smart Citation

Estimating the Prevalence of Dementia in India Using a Semi-Supervised Machine Learning Approach

et al. 2023

View full text Add to dashboard Cite

Introduction. Accurate estimation of dementia prevalence is essential for making effective public and social care policy to support individuals and families suffering from the disease. The purpose of this paper is to estimate the prevalence of dementia in India using a semi-supervised machine learning approach based on a large nationally representative sample. Methods. The sample of this study is adults 60 years or older in the wave 1 (2017-2019) of the Longitudinal Aging Study in India (LASI). A subsample in LASI received extensive cognitive assessment and clinical consensus ratings and therefore have diagnoses of dementia. A semi-supervised machine learning model was developed to predict the status of dementia for LASI participants without diagnoses. After obtaining the predictions, sampling weights and age standardization to the World Health Organization (WHO) standard population were applied to generate the estimate for prevalence of dementia in India. Results. The prevalence of dementia for those aged 60 years and older in India was 8.44% (95% CI: 7.89%~9.01%). The age-standardized prevalence was estimated to be 8.94% (95% CI: 8.36%~9.55%). The prevalence of dementia was greater for those who were older, were females, received no education, and lived in rural areas. Discussion/Conclusion. The prevalence of dementia in India may be higher than prior estimates derived from local studies. These prevalence estimates provide the information necessary for making long-term planning of public and social care policy. The semi-supervised machine learning approach adopted in this paper may also be useful for other large population ageing studies that have a similar data structure.

show abstract

“…[1][2][3] Estimators for agreement are also used in other fields, such as a performance metric in machine learning. 4,5 Agreement studies are useful when the outcome is unknown or subjective (eg, Alzheimer's disease diagnosis based on neuroimaging biomarkers or the optimal machine learning model). Multiple raters may be included in agreement studies.…”

Section: Introductionmentioning

confidence: 99%

Simulating and estimating agreement in the presence of multiple raters and covariates

McKenzie

Mahnken

2023

Statistics in Medicine

View full text Add to dashboard Cite

Cohen's and Fleiss's kappa are popular estimators for assessing agreement among two and multiple raters, respectively, for a binary response. While additional methods have been developed to account for multiple raters and covariates, they are not always applicable, rarely used, and none simplify to Cohen's kappa. Furthermore, there are no methods to simulate Bernoulli observations under the kappa agreement structure such that the developed methods could be adequately assessed. This manuscript overcomes these shortfalls. First, we developed a model-based estimator for kappa that accommodates multiple raters and covariates through a generalized linear mixed model and encompasses Cohen's kappa as a special case. Second, we created a framework to simulate dependent Bernoulli observations that upholds all 2-tuple pair of rater's kappa agreement structure and includes covariates. We used this framework to assess our method when kappa was nonzero. Simulations showed that Cohen's and Fleiss's kappa estimates were inflated unlike our model-based kappa. We analyzed an Alzheimer's disease neuroimaging study and the classic cervical cancer pathology study. The proposed model-based kappa and advancement in simulation methodology demonstrates that the popular approaches of Cohen's and Fleiss's kappa are poised to yield invalid conclusions while our work overcomes shortfalls, leading to improved inferences.

show abstract

Learning From Clinical Consensus Diagnosis in India to Facilitate Automatic Classification of Dementia: Machine Learning Study

Cited by 11 publications

References 58 publications

Identification of Drug-Induced Liver Injury Biomarkers from Multiple Microarrays Based on Machine Learning and Bioinformatics Analysis

Identification of Drug-Induced Liver Injury Biomarkers from Multiple Microarrays Based on Machine Learning and Bioinformatics Analysis

Estimating the Prevalence of Dementia in India Using a Semi-Supervised Machine Learning Approach

Simulating and estimating agreement in the presence of multiple raters and covariates

Contact Info

Product

Resources

About