Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyse the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
Background Myotonic dystrophy type 1 (DM1) is an incurable multisystem disease caused by a CTG-repeat expansion in the DM1 protein kinase (DMPK) gene. The OPTIMISTIC clinical trial demonstrated positive and heterogenous effects of cognitive behavioral therapy (CBT) on the capacity for activity and social participations in DM1 patients. Through a process of reverse engineering, this study aims to identify druggable molecular biomarkers associated with the clinical improvement in the OPTIMISTIC cohort. Methods Based on full blood samples collected during OPTIMISTIC, we performed paired mRNA sequencing for 27 patients before and after the CBT intervention. Linear mixed effect models were used to identify biomarkers associated with the disease-causing CTG expansion and the mean clinical improvement across all clinical outcome measures. Results We identified 608 genes for which their expression was significantly associated with the CTG-repeat expansion, as well as 1176 genes significantly associated with the average clinical response towards the intervention. Remarkably, all 97 genes associated with both returned to more normal levels in patients who benefited the most from CBT. This main finding has been replicated based on an external dataset of mRNA data of DM1 patients and controls, singling these genes out as candidate biomarkers for therapy response. Among these candidate genes were DNAJB12, HDAC5, and TRIM8, each belonging to a protein family that is being studied in the context of neurological disorders or muscular dystrophies. Across the different gene sets, gene pathway enrichment analysis revealed disease-relevant impaired signaling in, among others, insulin-, metabolism-, and immune-related pathways. Furthermore, evidence for shared dysregulations with another neuromuscular disease, Duchenne muscular dystrophy, was found, suggesting a partial overlap in blood-based gene dysregulation. Conclusions DM1-relevant disease signatures can be identified on a molecular level in peripheral blood, opening new avenues for drug discovery and therapy efficacy assessments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.