2021
DOI: 10.1155/2021/5515342
|View full text |Cite
|
Sign up to set email alerts
|

i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning

Abstract: As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) plays a crucial role in controlling gene replication, expression, cell cycle, DNA replication, and differentiation. The accurate identification of 4mC sites is necessary to understand biological functions. In the paper, we use ensemble learning to develop a model named i4mC-EL to identify 4mC sites in the mouse genome. Firstly, a multifeature encoding scheme consisting of Kmer and EIIP was adopted to describe the DNA sequences. Secondly,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 58 publications
0
3
0
Order By: Relevance
“…In TNC, all samples of 41 nt produce 39 components with the equation of L − k + 1. Here, L stands for the sequence length, and k stands for the K-mer value as an integer [ 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 ]. ATG, TGC, GCG, and CGA are the four 3-mers that can be tokenized from the DNA sequence “ATGCGA,” for instance.…”
Section: Methodsmentioning
confidence: 99%
“…In TNC, all samples of 41 nt produce 39 components with the equation of L − k + 1. Here, L stands for the sequence length, and k stands for the K-mer value as an integer [ 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 ]. ATG, TGC, GCG, and CGA are the four 3-mers that can be tokenized from the DNA sequence “ATGCGA,” for instance.…”
Section: Methodsmentioning
confidence: 99%
“…These methods are simple to use, but their capacity of detecting 4mC sites cross species need promoting. No less than five computational methods [ 36 41 ] were intended to predict 4mC sites in mouse genomes. Both 4mCPred-EL [ 36 ] and i4mC-Mouse [ 37 ] were feature engineering-based methods.…”
Section: Introductionmentioning
confidence: 99%
“…These predictors typically make use of machine learning algorithms to learn from available data to perform novel predictions and gain new insights. Recently, a variety of machine learning algorithms are useful for this goal, such as support vector machine (SVM) ( Chen et al, 2017 ; He et al, 2019 ; Wei et al, 2019a , b ; Lv et al, 2020b ; Zhao et al, 2020 ), random forest (RF) ( Hasan et al, 2020a , b ; Lv et al, 2020a ; Alghamdi et al, 2021 ; Zulfiqar et al, 2021a ), Markov model (MM) ( Yang et al, 2020 ), and the combined or ensemble methods ( Gong and Fan, 2019 ; Manavalan et al, 2019a , b ; Tang et al, 2020 ; Li et al, 2021 ), extreme gradient boosting (XGBoost) ( Wang et al, 2021 ) and Laplacian Regularized Sparse Representation ( Ding et al, 2021 ). As shown in Supplementary Table 1 , SVM is the most widely used traditional machine learning algorithms in the model development and method comparison for 4mC prediction, followed by RF.…”
Section: Introductionmentioning
confidence: 99%