2020
DOI: 10.1038/s41467-020-16539-4
|View full text |Cite
|
Sign up to set email alerts
|

Machine learning uncovers cell identity regulator by histone code

Abstract: Conversion between cell types, e.g., by induced expression of master transcription factors, holds great promise for cellular therapy. Our ability to manipulate cell identity is constrained by incomplete information on cell identity genes (CIGs) and their expression regulation. Here, we develop CEFCIG, an artificial intelligent framework to uncover CIGs and further define their master regulators. On the basis of machine learning, CEFCIG reveals unique histone codes for transcriptional regulation of reported CIG… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(25 citation statements)
references
References 47 publications
0
25
0
Order By: Relevance
“…Given that the data is imbalanced, we applied the synthetic minority over-sampling technique (SMOTE) to achieve 1:1 balanced data for sepsis cases and non-sepsis controls (at the clinical note level). Prior literature argued that oversampling (instead of undersampling) will result in more accurate models 21 23 and SMOTE has been used in earlier studies that develop machine learning classifiers for other clinical conditions such as oral cancer detection 22 and cell identification/classification 24 , 25 where the prevalence of the positive cases are low. For comparative purposes, we also develop, test, and report the models without any oversampling to present the possibility of operating this algorithm in a normal clinical environment where the prevalence of sepsis is relatively low.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Given that the data is imbalanced, we applied the synthetic minority over-sampling technique (SMOTE) to achieve 1:1 balanced data for sepsis cases and non-sepsis controls (at the clinical note level). Prior literature argued that oversampling (instead of undersampling) will result in more accurate models 21 23 and SMOTE has been used in earlier studies that develop machine learning classifiers for other clinical conditions such as oral cancer detection 22 and cell identification/classification 24 , 25 where the prevalence of the positive cases are low. For comparative purposes, we also develop, test, and report the models without any oversampling to present the possibility of operating this algorithm in a normal clinical environment where the prevalence of sepsis is relatively low.…”
Section: Resultsmentioning
confidence: 99%
“…The non-SMOTE models present the performance of the model in typical clinical settings where the prevalence of sepsis is low (Table 3 ). For brevity, we focus on describing the results for the SMOTE models as in prior studies 22 , 24 , 25 .…”
Section: Resultsmentioning
confidence: 99%
“…Both tissue-specific and ubiquitous TFs have the potential to collaborate with HNF4A at hepatic CRMs [ 103 ]. In general, whether or not ubiquitously expressed TFs should also be considered as drivers of cell identity in addition to cell-specific regulators, remains a matter of debate [ 7 , 113 ]. Noteworthily, while binding sites of individual TFs have shown poor conservation across species [ 114 ], combinatorial binding has been found to be evolutionary stable and more strongly associated with liver disease loci identified by genome-wide association studies as compared with binding sites of single TFs [ 100 , 111 ], highlighting the importance of collaborative gene regulation by multiple TFs acting in concert to establish and maintain cell identity.…”
Section: Mechanisms Of Action In the Control Of Cell Identity By Hmentioning
confidence: 99%
“…The evolution of epigenomic readers and writers themselves ultimately affects their function and changes in the epigenomic landscape may thus be understood as a consequence of this very process. While it is clear that the presence or absence of epigenetic marks in principle has a major influence on gene expression and cell identity, it is still largely open which marks have which functional significance where in the genome (Barrero, et al 2010; Kim and Costello 2017; Xia, et al 2020). As in many other examples (Bergmiller, et al 2012; Luo, et al 2015; Arun, et al 2016), it is plausible that the degree of conservation would be a strong indicator for functional relevance.…”
Section: Introductionmentioning
confidence: 99%